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Abstract 



. We consider a class of nonlinear mappings Fa,n in ^ N indexed by symmetric random matrices 

A 6 M. NxN with independent entries. Within spin glass theory, special cases of these mappings 
correspond to iterating the TAP equations and were studied by Erwin Bolthausen. Within infor- 
mation theory, they are known as 'approximate message passing' algorithms. 

We study the high-dimensional (large N) behavior of the iterates of F for polynomial functions 
i-^h 1 F, and prove that it is universal, i.e. it depends only on the first two moments of the entries of 

C$ . A, under a subgaussian tail condition. As an application, we prove the universality of a certain 

phase transition arising in polytope geometry and compressed sensing. This solves -for a broad 
class of random projections- a conjecture by David Donoho and Jared Tanner. 

1 Introduction and main results 

T 1 , 

(TV Let A G R NxN be a random Wi gner matrix, i.e. a random matrix with i.i.d. entries Aij satisfying 
E{Ajj} = and E{^4| } = 1/N. Considerable effort has been devoted to studying the distribution of 
the eigenvalues of such a matrix [AGZ091 IBS051 ITV12j . The universality phenomenon is a striking 
■ recurring theme in these studies. Roughly speaking, many asymptotic properties of the joint eigen- 
values distribution are independent of the entries distribution as long as the latter has the prescribed 
first two moments, and satisfies certain tail conditions. We refer to [AGZ09^ IBS05|. ITV12j and ref- 
erences therein for a selection of such results. Universality is extremely useful because it allows to 
compute asymptotics for one entries distribution (typically, for Gaussian entries) and then export 
the results to a broad class of distributions. 

In this paper we are concerned with random matrix universality, albeit we do not focus on 
eigenvalues properties. Given A G M. NxN , and an initial condition x° £ K independent of A, we 
consider the sequence (x*)t>o t G N defined by letting, for t > 0, 

x t+1 = Af(x t ;t) - bt /(x*- 1 ;* - 1) , b t = ^div(/(x; t))\ x=xt . (1.1) 
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Here, div denotes the divergence operator and, for each t > 0, /(•;£) : W N — > is a separable 
function, i.e. f(z; t) = (fi(zi; t), . . . , f2(zN]t)) where the functions fi(- ;t) : R — > K are polynomials 
of bounded degree. In particular b t = N" 1 J2i=i 

The present paper is concerned with the asymptotic distribution of x* as N — > oo with t fixed, 
and establishes the following results: 

Universality. As N — > oo, the finite-dimensional marginals of the distribution of x l are asymptot- 
ically insensitive to the distribution of the entries of Aij. 

State evolution. The entries of x l are asymptotically Gaussian with zero mean, and variance that 
can be explicitly computed through a one-dimensional recursion, that we will refer to as state 
evolution 

Phase transitions in polytope geometry. As an application, we use state evolution to prove 
universality of a phase transition on polytope geometry, with connections to compressed sens- 
ing. This solves -for a broad class of random matrices with independent entries- a conjecture 
put forward by David Donoho and Jared Tanner in |Don05al IDTllj . 

In order to illustrate the usefulness of the first two technical results, we start the presentation of our 
results from the third one. 

1.1 Universality of polytope neighbor liness 

A polytope Q is said to be centrosymmetric if x £ Q implies — x € Q. Following [Don05bl IDon05a] 
we say that such a polytope is k-neighborly if the condition below holds: 

(I) Every subset of k vertices of Q which does not contain an antipodal pair, spans a {k — 1) 
dimensional face. 

The neighborliness of Q is the largest value of k for which this condition holds. The prototype of 
neighborly polytope is the l\ ball C n = {x E W 1 : \\x\\i < 1}, whose neighborliness is indeed equal 
to n. 

It was shown in a series of papers |Don05b[ IDon05at IDT05b[ IDT05at IDT09] that polytope neigh- 
borliness has tight connections with the geometric properties of random point clouds, and with 
sparsity-seeking methods to solve underdetermined systems of linear equations. The latter are in 
turn central in a number of applied domains, including model selection for data analysis and com- 
pressed sensing. For the reader's convenience, these connections will be briefly reviewed in Section 

El 

Intuitive images of low-dimensional polytopes suggest that 'typical' polytopes are not neighborly: 
already selecting k = 2 vertices, does lead to a segment that connects them and passes through the 
interior of Q. This conclusion is spectacularly wrong in high dimension. Natural random construc- 
tions lead to polytopes whose neighborliness scales linearly in the dimension. Motivated by the above 
applications, and following |Don05b| IDon05a[ IDT05bl IDT05aj . we focus here on a weaker notion of 
neighborliness. Roughly speaking, this corresponds to the largest k such that most subsets of k 
vertices of Q span a (A; — l)-dimensional face. In order to formalize this notion, we denote by $(Q; I) 
the number of [£\ -dimensional faces of Q. 
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Definition 1. Let Q = {Q n } n >o 

be a sequence of centro symmetric polytopes indexed by n where Q n 
has 2n vertices and has dimension m = m(n): Q n C W n . We say that Q has weak neighborliness 
p£ (0,1) if for any£> 0, 

d(Q n ;m(n)p(l - Q) 
n^L $(C n ; m(n)p{l - £)) 

g(Q ra ;m(w)p(l + Q) 
n^L ${C n -m{n)p{l + £)) 

If the sequence Q is random, we say that Q has weak neighborliness p (in probability) if the above 
limits hold in probability. 

In other words, a sequence of polytopes {Q n } n >o nas weak neighborliness p, if for large n the m 
dimensional polytope Q n has close to the maximum possible number of k faces, for all k < mp(l — £). 

Note 1. Note that previously the neighborliness of a polytope was defined to be the largest integer k 
satisfying condition (I). However, in our definition weak neighborliness refers to the fraction k/n. 
This is due to the fact that weak neighborliness is defined in the limit n — > oo. 

The existence of weakly neighborly polytope sequences is clear when m(n) = n since in this case 
we can take Q n = C n with p = 1, but the existence is highly non-trivial when m is only a fraction 
of n. 

It comes indeed as a surprise that this is a generic situation as demonstrated by the following 
construction. For a matrix A £ M mxn , and S C R n , let AS = {Ax Gl m : x £ S}. In particular, 
AC n is the centrosymmetric m-dimensional polytope obtained by projecting the n-dimensional l\ 
ball to m dimensions. The following result was proved in |Don05a] . 

Theorem 1 (Donoho, 2005). There exists a function p* : (0, 1) — > (0, 1) such that the following 
holds. Fix 5 € (0, 1). For each n G N, let m(n) = [nS\ and define A(n) G ^jn(n)xn ^ Q ^ e a ran( ] i0m 
matrix with i.i.d. Gaussian entries. 

Then, the sequence of polytopes {A(n)C n } n >o has weak neighborliness p*(5) in probability. 

A characterization of the curve 5 i— >■ was provided in |Don05aj . but we omit it here since a 

more explicit expression will be given below. 

The proof of Theorem [1] is based on exact expressions for the number of faces ${A(n)C n \ £). 
These are in turn derived from earlier works in polytope geometry by Affentranger and Schneider 
[AS92j and by Vershik and Sporyshev [VS92J. This approach relies in a fundamental way on the 
invariance of the distribution of A(n) under rotations. 

Motivated by applications to data analysis and signal processing, Donoho and Tanner jDTllj 
carried out extensive numerical simulations for random polytopes of the form A(n)C n for several 
choices of the distribution of A{n). They formulated a universality hypothesis according to which 
the conclusion of Theorem Q] holds for a far broader class of random matrices. The results of their 
numerical simulations were consistent with this hypothesis. 

Here we establish the first rigorous result indicating universality of polytope neighborliness for a 
broad class of random matrices. Define the curve (5, p*(S)), 5 G (0, 1), parametrically by letting, for 



= 1, 
= 0. 
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a G (0, oo): 

6 = *m (12) 

P = 1-^, (1.3) 
0(a) 

where 0(z) = e -2 I 2 jyphx is the Gaussian density and §{x) = <j>(z) dz is the Gaussian distri- 
bution. Explicitly, if the above functions on the right-hand side of Eqs. (jl.2p . (| 1 .3 j) are denoted by 
/ 4 (a), f p (a), theS P*(S) = f P (fi\S)). 

Here we extend the scope of Theorem Q] from Gaussian matrices to matrices with independent 
subgaussiarj! entries (not necessarily identically distributed) . 

Theorem 2. Fix 5 G (0, 1). For each n G N, let m(n) = |_"-<5J and define A{n) G ]R m ( n ) xn to &e an 
random matrix with independent subgaussian entries, with zero mean, unit variance, and common 
scale factor s independent of n. Further assume Aij{n) = Aij(n) + VQGij{n) where vq > is 
independent of n and {Gij(n)} ie ^ j<=[ n ] is a collection of i.i.d. N(0, 1) random variables independent 
ofA{n). 

Then the sequence of polytopes {A(n)C n } n >o has weak neighborliness p*(5) in probability. 

It is likely that this theorem can be improved in two directions. First, a milder tail condition than 
subgaussianity is probably sufficient. Second, we are assuming that the distribution of Aij has an 
arbitrarily small Gaussian component. This is not necessary for the upper bound on neighborliness, 
and appears to be an artifact of the proof of the lower bound. 

The proof of Theorem is provided in Section [5j By comparison, the most closely related result 
towards universality is by Adamczak, Litvak, Pajor, and Tomczak-Jaegermann [ALPTJlT] . For 
a class of matrices A{n) with i.i.d. columns, these authors prove that A(n)C n has neighborliness 
scaling linearly with n. This however does not suggest that a limit weak neighborliness exists, and 
is universal, as established instead in Theorem [2j 

At the other extreme, universality of compressed sensing phase transitions can be conjectured 
from the results of the non-rigorous replica method |KWT09| IRFG09] . 



1.2 Universality of iterative algorithms 

We will consider here and below a setting that is somewhat more general than the one described 
by Eq. (jl.lj) . Following the terminology of [DMM09], we will refer to such an iteration as to the 
approximate message passing (AMP) iteration/algorithm. 

We generalize the iteration (jl.ip to take place in the vector space V^at = (M 9 )^ ~ M. Nxq . 
Given a vector x G V^at, we shall most often regard it as an N- vector with entries in M g , namely 
x = (xi, . . . , xjv), with Xj G W 1 . Components of Xj G will be indicated as (xj(l), . . . , Xi(q)) = x,;. 

Given a matrix A G M. NxN , we let it act on V q ^ in the natural way, namely for v',v G V q ,N 
letting v 1 = Av be given by v^ = Ylf=i AijVj for all i G [N]. Here and below [N] = {1, . . . , N} is the 
set of first N integers. In other words we identify A with the Kronecker product A®I qxq . 

x It is easy to show that fs{ct) is strictly decreasing in a G [0, oo), with fs{0) = 1, lima-j-oo fs(a) = 0, and hence f^ 1 
is well defined on [0, 1]. Further properties of this curve can be found in [DMM09I iDMMll] , 

2 Sec Eq. (|1,7|) for the definition of subgaussian random variables. 
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Definition 2. An AMP instance is a triple (A,J-,x°) where: 

1. A G M. NxN is a symmetric matrix with Ai : i = for all i G [N]. 

2. F = {f k : k G [N]} is a collection of mappings f k : R q x N ->■ M f/ ; (x, t) i-)- / fc (x, t) t/iat are 
locally Lipschitz in their first argument; 

3. x° G Vq,N is an initial condition. 

Given T = {f k : k G [N]}, we define /(•;*) : Vg,iV - ► V^JV by letting v' = f(v;t) be given by 
v'i = /*(vi;t) for allie [N]. 

Definition 3. The approximate message passing orbit corresponding to the instance (A, J 7 , x°) is 

the sequence of vectors {a;*}t>o, x t G V q ,N defined as follows, for t>0, 

x t+1 =Af(x t ;t)-B t f(x t - 1 ;t-l). (1.4) 
Here B t : V q ,N — > Vq,N is the linear operator defined by letting, for v' = B t v, 

<= \ E4|( x i.*) v„ (i.5) 

yew / 

with denoting the Jacobian matrix of • ; t) : M. q — > M. q . 

The above definition can also be summarized by the following expression for the evolution of a 
single coordinate under AMP 

je[iv] je[iv] 

Notice that Eq. (jl.ip corresponds to the special case q = 1, in which we replaced A?, by E{A?.} = 1/iV 
for simplicity of exposition. 

Recall that a centered random variable X is subgaussian with scale factor a 2 if, for all A > 0, we 
have 

E (e AX ) < e 2- ^. (1.7) 

Definition 4. Ze£ {(A(A), J-jy, x 0,Ar )}jv>i 6e a sequence of AMP instances indexed by the dimension 
N, with A(N) a random matrix and x 0,N a random vector. We say that the sequence is (C, d) -regular 
(or, for short, regular ) polynomial sequence if 

1. For each N, the entries (Aij(N))i<i < j<N are independent centered random variables. Further 
they are subgaussian with common scale factor C/N. 

2. For each N, the functions f l (-;t) in Tn (possibly random, as long as they are independent 
from A(N), x 0,N ) are polynomials with maximum degree d and coefficients bounded by C. 

3. For each N, A(N) and x 0,N are independent. Further, we have ex P{|l x i ' lli/^} — NC 
with probability converging to one as N — > oo. 
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We state now our universality result for the algorithm (|1 .4|) . 

Theorem 3. Let (A(N),T N ,x°^ N ) N > 1 and (A(N),J 7 N ,x°^ N ) N > 1 be any two (C,d) -regular polyno- 
mial sequences of instances, that differ only in the distribution of the random matrices A(N) and 
A(N). 

Denote by {x t }t>o, {5*}t>o the corresponding AMP orbits. Assume further that for all N and 
all i < j, K{Afj} = E{^4? }. Then, for any set of polynomials {pN,i}N>o,i<i<N PN,i : — » M ; with 
degree bounded by D and coefficients bounded by B for all N and i G [N], we have 

1 N 

J im m E \ E PnA4) - Ejw,i$)} = . (1.8) 
i=l 

1.3 State evolution 

Theorem [3] establishes that the behavior of the sequence {x t }t>o is, in the high dimensional limit, 
insensitive to the distribution of the entries of the random matrix A. In order to characterize this 
limit, we need to make some assumption on the collection of functions Tn- 

Definition 5. We say that the sequence of AMP instances {(A(N), J-jy, x 0,n )}n>o is polynomial 
and converging (or simply converging^) if it is {C,d)-regular and there exists: (i) An integer k; (ii) A 
symmetric matrix W G M. kxk with non-negative entries; (Hi) A function j : R' x I' x [I] x N -> M. q , 
with g(x, Y, a, t) = (<?i(x, Y, a, t), . . . , g q (x., Y, a, t)) and, for each r G [q], a G [k], t G N, g r ( ■ , a, t) a 
polynomial with degree d and coefficients bounded by C; (iv) k probability measures P±, . . . , Pk on 
W , with P a a finite mixture of (possibly degenerate) Gaussians for each a G [k]; (v) For each N, a 
finite partition U U • • • U Cj^ = [N]; (vi) k positive semidefinite matrices E^,. . . £2 G M. qxq , 
such that the following happens. 

1. For each a G [k], we have limjv-voo \C^\/N = c a G (0, 1). 

2. For each N > 0, each a G [k] and each i G , we have / l (x, t) = g(~x,Y(i),a,t) where 
Y(l), . . . , Y(N) are independent random variables with Y(i) ~ P a whenever i G for some 
ae[k]. 

3. For each N, the entries {Aij(N)}i<i<j<N « r e independent subgaussian random variables with 
scale factor C/N, EAij = 0, and, for i G and j G Cf, E{j4?.} = W ab /N. 

4- For each a G [k], in probability, 

J im T^VT E 5(x°,y(i),a,0) 5 (x°,y(i),a,0) T = £0. (1.9) 

With a slight abuse of notation, we will sometime denote a converging sequence by {(A(N),g, x 0,n )}n>o. 
We use capital letters to denote the Y(i)'s to emphasize that they are random and do not change 
across iterations. 

Our next result establishes that the low-dimensional marginals of {x*} are asymptotically Gaus- 
sian. State evolution characterizes the covariance of these marginals. For each t > 1, state evolution 



6 



defines a set of k positive semidefinite matrices S* = (T,\, S|, • • • , St), with S* £ M 9 * 17 . These are 
obtained by letting, for each t > 1 

S* = ^Q^S*" 1 (1.10) 

6=1 

= JL{g{ZlY a ,a,t)g(ZlY a ,a,t) J } , (1.11) 
for all a G [A;]. Here 1^ ~ P a , ~ N (0, S* ) and and are independent. 

Theorem 4. Lei (A(N),J-N,x°)]\r>o be a polynomial and converging sequence of AMP instances, 
and denote by {x t }t>o the corresponding AMP sequence. Then for each t > 1, each a £ [k], and each 
locally Lipschitz function tp : M. q x — > M such that y)\ < K(l + \\yW2 + [| >c[| 3) , we have, in 
probability, 

Inn j-L- ^ t j ,Y(i))=E{^(Z a ,Y a )}, (1.12) 

where Z a ~ N(0,S*) is independent ofY a ~ P a . 

We conclude by mentioning that, following }DMM09] . generalizations of the algorithm (jl.4p were 
studied by several groups |Schl01lRanllUMAYBll] . for a number of applications. Universality results 
analogous to the one proved here are expected to hold for such generalizations as well. 

1.4 Outline of the paper 

The paper is organized as follows. After some preliminary facts and notations in Section [21 Section 
[3] considers the AMP iteration (jl.4p and proves Theorems [3] and HI In order to achieve our goal, we 
introduce two different iterations whose analysis provides useful intermediate steps. We also prove a 
generalization of Theorem H] to estimate functions of messages at two distinct times ^(x-, x.f, Y{i)). 

Section U] proves a generalization of Theorem 0] to the case of rectangular (non-symmetric) ma- 
trices A. This is achieved by effectively embedding the rectangular matrix, into a larger symmetric 
matrix and applying our results for symmetric matrices. 

The generalization to rectangular matrices is finally used in Section [5] to prove our result on 
the universality of polytope neighborliness, Theorem [2j This is done via a correspondence with 
compressed sensing reconstruction established in |Don05a] . and a sharp analysis of an AMP iteration 
that solves this reconstruction problem. 

2 Notations and basic simplifications 

We will always view vectors as column vectors. The transpose of vector v is the row vector indicated 
by v T . Analogously, the transpose of a matrix (or vector) M is denoted by M T . For a vector v S M m , 
we denote its l v norm, p > 1 by ||i> || p = Q32=i bi| p ) 1//p - This is extended in the usual way to p = 00. 
We will often omit the subscript if p = 2. For a matrix M, we denote by ||-M|| p the corresponding i v 
operator norm. The standard scalar product of u, v G M. m is denoted by (n, v ) = J2iLi u i v i- Given 
v G M m , w G M. n , we denote by [v,w] G M. m+n the (column) vector obtained by concatenating v 
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and w. The identity matrix is denoted by I, or I mX m if the dimensions need to be specified. The 
indicator function is 1( • ). The set of first m integers is indicated by [m] = {1, . . . , m}. Finally, given 
x = (x(l),x(2), . . . , x(q)) G M q and m = (m(l), . . . , m(q)) G N 9 , we write 



= Y[x(r) m ^. (2.1) 



r=l 

Following the common practice, degenerate Gaussian distributions will be considered Gaussian, 
without further qualification. In particular, any distribution with finite support in R fc is a finite 
mixture of Gaussians. 

In our proof of Theorem[4]we will make use of the following simplification, that lightens somewhat 
the notation. 

Remark 1. For proving Theorem^ it is sufficient to consider the case in which g : (x,Y,a,t) t— > 
g(x,Y,a,t) is independent ofY. 

Proof. We can assume without loss of generality that the measures P a are Gaussian. Indeed if, for 
instance, P a is a mixture of £ gaussians, P a = w\ P a ,i + u>2 P a ,2 + • • • + wiP a ,i then we can replace 
effectively the partition element by a finer partition , . . . , whereby U • • • U C^ e = 
and (C^l, . . . , \C^»\ are multinomial with parameters (wi, . . . ,wg). Notice that this finer partition 
is random, but \C^\/N — > c a Wi almost surely, and therefore the theorem applies. 

Assume therefore that the P a are gaussian. By replacing g(x, Y, a, t) by </(x, Y, a, t) = g(x, Q a Y + 
v a ,a,t) for suitable matrices Q a , and vectors v a , we can always assume Y a ~ N(0,Ig X q) for all a. 
Assume therefore Y a ~ N(0, Iqxq)- Enlarge the space by letting k' = k + q, N' = (q + 1)N and 
Cf = {M + l, . . . , N(£+l)}, for a = k + £ > k, while Cf = C% for a < k. We further let q' = q + q 
and define new functions g' : R q x R« x [k 1 ] x N — > M. q independent of the second argument (Y) as 
follows. For xeR«,x£ W, we let 

3i((x,x),y,a,tj = g r (x,Sc,a,t) for r e {1, ...,?}, a e {1, k} , 

g'ri (x,x),y,a,t) = for r G {q + 1, ... ,q + q} ,a G {!,... ,k} , 



5^(x,x),y,a,tJ = foTre{l,...,q},ae{k + l,...,k + q}, 

g' q J(x,x),Y,k + £',t) = !(£ = £') for £,£' G {1, . . . , q} . 



We further use matrix A' constructed as follows: A'^ = Aij for i,j < N, and Aij ~ N(0, 1/N) if 
i > N or j > N. (Notice that 'E{{A' i -) 2 } = 2/N' but this amounts just to an overall rescaling 
and is of course immaterial.) Clearly the functions g' do not depend on Y as claimed. Further, 
x ~ N(0, Iqxg) at all iterations. Hence the new iteration is identical to the original one when 
restricted on {xj(r) : i < N,r < q}. □ 

3 Proofs of Theorems [3] and |4] 

In this section we consider the AMP iteration (|1.4p . and prove Theorem [3] and Theorem [U and indeed 
generalize the latter. 
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We extend the state evolution (jl.lOp by defining for each t > s > and for all a G [k], a positive 
semidefinite matrix S a ' s E ]r( 2| j) >< ( 2| j) as follows. For boundary conditions, we set 



^0,0 _ I L a vit.O _ I u | v0,t 





(3.1) 



with E* defined per Eq. (|1.1U|) . For any s,t > 1, we set recursively 

^ = J2 c b W ^t 1,S ~\ (3-2) 

6=1 

£*> s = e{x o XJ} 5 [ 5 (Z*,y a ,a,t),<7(Z a s ,y a ,a, S )], (Z*,Z o s )~N(0,E*/). (3.3) 

Recall that [g(Z^,Y a , a, t), g(Z^,Y a , a, s)] £ M. 2q is the vector obtained by concatenating g(Z l a ,Y a , a, t) 
and g(Z^,Y a , a, s). Note that taking s = t in ()3.2|) . we recover the recursion for S* given by Eq. (jl.lOp . 
Namely for all t we have 

(3^) 

Theorem 5. Xei {(^4(./V), J 7 ^, x 0,Ar )}Ar>i 5e a polynomial and converging sequence of instances and 
denote by {x'}t>o £/ie corresponding AMP orbit. 

Fix s,t > 1. If s ^ t, further assume that the initial condition x 0,N is obtained by letting 
x i ' ~ independent and identically distributed, with Q a a finite mixture of Gaussians for each 
a. Then, for each a £ [k], and each locally Lipschitz function i/j : 1' x I' x R' -> 1 such that 
^(x, x', y)\ < K(l + \\yW2 + ||x||2 + ||x'|||) , we have, in probability, 

AoWn E ^(x*,x|,y(i)) = E^(z*,^,y a )], 

1 a 1 jec? 

where (Z^,Z^) ~ N(0,£a' s ) is independent ofY a ~ P a . 

Throughout this section, we will assume that {(A(N),Tn,x 0,n )}, {(A(N),Tn,x 0,n )}, etc. are 
(C, d)-regular polynomial sequences of AMP instances. We will often omit explicit mention of this 
hypothesis. Notice that Theorem [3] holds per realization of the functions J~n- Because of this, and 
of Remark [TJ we will consider hereafter Tn to be non-random. 

The rest of this section is organized as follows. In subsection 13.11 we introduce two new itera- 
tions that are useful intermediary steps for our analysis. We show that the corresponding variables 
admit representations as sums over trees in Sec. 13.21 and use them to prove basic properties of these 
recursions in Sees. 13.31 13. 4[ and 13.51 Theorems [3] and [5] are then proved in Sees. 13.61 13.71 Because 
of Eq. (|3.4p . Theorem U] follows as a special case of Theorem [5j Indeed, we will show that both 
statements are equivalent through a reduction argument. Depending on the application, Theorem [5] 
might be a more convenient formulation of the state evolution and will be used in Section HJ 
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3.1 Message passing iteration 

We define two new message passing sequences corresponding to the instance (A, T, x 0,N ). For each 
i G [N] we use the short notation [N] \ i to denote the set [N] \ {i}. We now define the sequence of 
vectors (z*_^-)t e N; where for each i ^ j G [N], z|_^ is a vector in R 9 or equivalently for each t G N, we 
can see (z*_^ ) as an iV X iV matrix with entries in R 9 (diagonal elements are never used) . The initial 

condition is denoted by z9_^- G R 9 for any i,j 6 [N] and is independent of j, such that z9_^- = x^'^ 
for all j i. The r-th coordinate of the vector z*j^ is defined by the following recursion for t > 0, 

= E ^/r(^>*), (3.5) 
te[AT]\j 

where //(-,t) : R 9 -> R is the r* h coordinate of f{-,t). 

We also define for each i G [iV] and t > 0, the vector z* +1 G R 9 by 

4 +1 (r) = £j4a (3-6) 

Our first result establishes universality of the moments of z*_^ for polynomial sequences of instances. 

Proposition 6. Let (A(N),Tn,x 0,N )n>i o,nd {A{N),J : n,x Q ' N )n>i be any two (C , d) -regular poly- 
nomial sequences of AMP instances, that differ only in the distribution of the random matrices A(N) 
and A(N). Assume that for all N and all i < j, K{A^} = K{A^}. Denote by z* the orbit (respec- 
tively i\) defined by \3. 6\) while iterating Iff. 5\) with matrix A (respectively A). Then for any t > 1 
and any m = (m(l), . . . ,m(q)) G N q , there exists K independent of N such that, for any i G [N]: 

< KN~ 1/2 . (3.7) 

The proof of this proposition is provided in Section 13.31 

Note 2. In this statement and and in the rest of this section, K is always understood as a function 
of d,t,q,m,C which may vary from line to line but which is independent of N . 

Our second message passing sequence is defined as follows: for a (C, d)-regular sequence of in- 
stances (A(N), J-jv, x°' N )n>i, we define for each N, an i.i.d. sequence of N x N random matrices 
{A*} ieN such that A = A(N). Then we define (yf^) by y°^- = x° ,Ar and for t > 

l£J(r) = £ AUM^t), (3.8) 
ie[N]\j 

and 

The asymptotic analysis of y is particularly simple because an independent random matrix A 1 is 
used at each iteration. In particular, it is easy to establish state evolution for y l . Our next result 
shows that y l provides a good approximation for z i ' . 



E 



E 
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Proposition 7. Let (A(N),J 7 n,x°' N )n>i be a (C , d) -regular polynomial sequence of instances. Let 
z* and y- be the sequences of vectors obtained by iterating Il3.5\) - (3lfy and {i3.8\) - {3~§\) respectively. 
Then for any t > 1 and any m = (m(l), . . . , m(q)) G N 9 , i/iere exists K independent of N such that, 
for any i G [N] : 



E 



E 



< KN- 1 / 2 . 



The proof of this proposition is provided in Section 13. 41 

Finally, recall that we defined the sequences (x')t e pj with x| G M 9 , by x° and for i > 0, 



(r) = £ A«/^(x*, t) - Al £ /i( x * _1 ^ " l)^y *) 



Proposition 8. Let (A(iV), J-jv, £ 0,Ar )/v>i be a (C,d)-regular sequence of instances. Denote by 
{x t }t>o the corresponding AMP sequence and by {z l }t>o the sequence defined by \3. 6\) while iter- 
ating \3. 5|) . Then for any t>l and m(l), . . . ,m(q) > 0, there exists K independent of N such that, 
for any i G [N],: 



E 



E 



< KN~ 1/2 . 



The proof of this proposition is provided in Section 13.51 



3.2 Tree representation 

By assumption of Proposition El we have for each £ G [N] and r G 

q 



E 

iiH \-i„<d 



(3.10) 



s=l 



where each coefficient c| i (r, t) belongs to M and has absolute value bounded by C (uniformly in 
I G [N], ii,. ..,i q , and t G N). 

We now introduce families of finite rooted labeled trees that will allow us to get a simple expression 
for the zj_ij(r)'s and zj(r), see Lemma [T] below. For a vertex v in a rooted tree T different from the 
root, we denote by ir(v) the parent of v in T. We denote the root of T by o. We consider that the 
edges of T are directed towards the root and write (u — > v) G E(T) if ir(u) = v. The unlabeled trees 
that we consider are such that the root and the leaves have degree one; each other vertex has degree 
at most d + 1, i.e. has at mostci children. We now describe the possible labels on such trees. The 
label of the root is in [N], the label of a leaf is in [N] x [q] x N 9 and all other vertices have a label 
in [N] x [q]. For a vertex v different from the root or a leaf, we denote its label by (£{v),r(v)) and 
call £{v) its type and r{v) its mark. The label (or type) of the root is also denoted by £{o); the label 
of a leaf v is denoted by (£(v),r(v),v[l], . . . v[q]). For a vertex u G T, we denote \u\ its generation 
in the tree, i.e. its graph-distance from the root. Also for a vertex u G T (which is not a leaf), we 
denote by u[r] the number of children of u with mark r G [q] (with the convention u[0) = 0). The 
children of such a node are ordered with respect to their mark: the labels of the children of u are then 
(£ x , 1), . . . , {l u[x \l), (rW +1 , 2), . . . , (£ U W+-+ U ^,q), where each (^[0]+-+«W ; . . . ju[o]+-+u[i+i]-^ is a 
u[i + l]-tuple with coordinates in [N]. We denote by L(T) the set of leaves of a tree T, i.e. the set 
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of vertices of T with no children. For v G L(T), its label (£(v),r(v),v[l], . . . v[q]) is such that for all 
i G [q], v[i] G N and v[l] + • • • + v[q] < d. We will distinguish between two types of leaves: those 
with maximal depth t = max{|?;|, v G L(T)} and the remaining ones. If v G L(T) and |f;| < t — 1, 
then we impose v[l] = ■ ■ ■ = v[q] = 0. This case corresponds to 'natural' leaves and since they have 
no children, the notation is consistent with the notation introduced for other nodes of the tree. For 
all other leaves, we do not make this assumption so that v[l] + • • • + v[q] can take any value in [d]. 
These leaves are 'artificial' and can be thought of as leaves resulting from cutting a larger tree after 
generation t so that the vector of the v[r]'s keeps the information on the number of children with 
mark r in the original tree. 

Definition 9. We denote by T 1 the set of labeled trees T with t generations as above that satisfy the 
following conditions: 

1. If V\ = o, t>2, ■ ■ ■ , Vk is a path starting from the root (i.e. with 7r(vj_|_i) = v for i > 1), then the 
corresponding sequence of types £(v{) is non-backtracking, i.e., for any 1 < i < k — 2, the three 
labels £(vi),£(vi+i) and £{vi + 2) are distinct. 

2. If u G L(T) and \u\ < t — 1 (i.e. u is a 'natural' leaf), then we have v[l] + • • • + v[q] = 0. 

3. If u G L(T) and \u\ = t (i.e. u is an 'artificial' leaf) then we have v[l] + • • • + v[q] < d. 

We also denote by 7"' the set of trees that satisfy conditions 2 and 3, but not necessarily the non- 
backtracking condition 1. Hence 7~* G 7^ : . 

We also let U l be the same set of trees in which marks have been removed (i.e. we identify any 
two trees that differ in the marks but not on type). Analogously, U is the set of trees in which marks 
have been removed, but do not necessarily the non-backtracking condition 1. 

For a labeled tree TgT 1 and a set of coefficients c = (c^ i (r, t)), we define three weights: 

A(T) = ^t{u)t{v)i 

ma) = n ^i.^^-H), 

(u-H>)€E(T) 

< t ) = n n 

v£L{T) s=l 

We define 

(a) lf_^j(r) C T l the family of trees such that: {%) The root has type i; (ii) The root has only one 
child, call it v; (in) The type of v is £(v) £ {i,j} and its mark is r(v) = r. 

(b) l~l(r) C T* the family of trees such that: (i) The root has type i; (ii) The root has only one 
child, call it v; (in) The type of v is £(v) ^ i and its mark is r(v) = r. 

The sets of trees U\(r) and U\^Ar) are obtained from 7~*(r) and T^Ar) by removing marks. 
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Lemma 1. Let (A(N),J 7 n,x ' N )n>i be a polynomial sequence of AMP instances. Denote by z* the 
orbit defined by \3. 6\) while iterating \3. 5\) with matrix A. Then, 



zU 3 {r) = A(T)T(T,c,t)x(X), (3.11) 

z\{r) = Y, A(T)T(T,c,t)x(T). (3.12) 

Proof. We first prove (|3.1ip by induction on t. For t = 1 we have, by definition 

ee[N]\jh+---+i q <d s=i 

This expression corresponds exactly to equation (|3.1ip since trees in T^Ar) have a root with label 
i and with one child with label (£, r, i\, . . . , i q ) for some I ^ {i, j} and i\ + • • • + i q < d. 
To prove the induction, we start with Eq. (|3.5p . which yields 



ee[N]\j h+-+i q <d s=i 
Using the induction hypothesis, we get 

U( z ^(s)Y 3 = III E A(T)T(T,c,t)x(T)' 

e nn^w^^^)^)' 

[7] t _ +i (s)ri + - +i 3 «=i*=i 

where the last expression is a sum over all (ii + • • • + i q )-tuples of trees with the first i\ trees in 
7^*^(1), the following i 2 in 7^(2), and so on. 
Hence, we get 

4%(r) = E E E ^4 li ... iJ9 (r,t)nn^(^)r(r,?,c,t)x(r fc s ). (3.13) 

te[JV]\j u...i 9 [T/^(s)] l i + '''+ l 9 s=l fc=l 

The claim now follows by observing that the set of trees in T^j(r) is in bijection with the set of 
pairs constituted by a label (I, r) with i ^ {i,j} and a (i\ + • • • + i g )-tuple of trees with exactly i s 
trees belonging to 7^1^ (s) for s£ [(/]. Indeed, take a root with label i and one child say v, with label 
(^, r) for some t ^ {«, j} and with a [i\ + • • • + z g )-tuple of trees with exactly i s trees belonging to 

T^i{s) for s S [g]. Now take v as the root of these (i\ -\ h i q ) trees, the order in the tuple giving 

the order of the subtrees of v. Note that the root of each subtree in T^L^s) has type I and in the 
resulting tree will get mark r. The proof of (|3. 12j) follows by the same argument, the only change is 
that in the sum in (|3.13p . we need now to include t = j. □ 
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3.3 Proof of Proposition [6] 

We are now in position to prove Proposition [6J 

Proof. For notational simplicity, we consider the case m(r) = m, and m(s) = for all s G [g] \ r. 
Thanks to Lemma [H we have 



T!,...,T m eTf{r) 



E 



E 



f[A(T £ 



(3.14) 



Since c is fixed in this section, we omit to write it in T(T, t). Notice that the general case m 



(m(l), . . . , m(q)) G N 9 admits a very similar representation whereby the sum over Ti 



,T m G THr) 



-m(q) 



e 7?(?)/ 



is replaced by sums over Ti, . . . , T m(1) G 7?(1), T 1 ,..., T m(2 ) G 7^(2), . . . , Ti, . 
The argument goes through essentially unchanged. 

We have T(T£,t) < C rft . We first concentrate on the term E [fl^Li -^C^i)]- Recall that, from 

subgaussian property of entries of A: E (e Aj4iJ ) < e^~. Now using Lemma PT2l from Appendix iDl we 
get for all i < j G [iV] 



JV - 5. 



(3.15) 



obtained by taking A = yiVs/C. 

For a labeled tree T, we define <j){T) = {(p(T)ij G N, i < j G [A r ]} where 4>(T)ij is the number of 
occurrences in T of an edge (u — > v) with endpoints having types £(u),£(v) G {«,:/}• Hence we have 



A(T)= H A 



<P(T)ij 



and 



E 



Kj'epv] 



n a ^ 



i=i 



n e 

i<j£[N] 



(3.16) 



Since the mean of each entry of the matrix A is zero, in Equation (|3.14p . we can restrict the sum 
to Ti, . . . ,T m such that for all i < j G [N], Y^t=i 4>{Ti)ij < 2 implies YllLi 4>{Tt)ij = 0. 

We now concentrate on the sum restricted to Ti , . . . , T m such that moreover there exists i < j G 
[N] such that YlT=i < K^)*j — 3. For such a m-tuple Ti, . . . ,T m , we denote // = //(Ti, . . . ,T m ) = 
Si<i X^Li 4>(Te)ij- Let G be the graph obtained by taking the union of the Tg's and identifying () 
vertices v with the same type £(v). We define e(Ti, . . . , T m ) = l(XX=i ^CHOij — 1) which is the 

number of edges counted without multiplicity in G. Since there exists i < j with Y1T=\ 4>{Te)ij > 3, 
we have 3 + 2(e(Ti, . . . , T m ) - 1) < //, i.e. e(Ti, . . . , T m ) < Using Eq. ([3TT5]) . we get 



E 



m 



< 



< 



n * 

i<je[iV] 



4 . .|E?li <K?>)i 



2c# ^ 



MN (M-l)/2 



iV" 



(3.17) 



since in the product on the right-hand side of (|3.16p . there are e(Ti, . . . ,T m ) terms different form 
one, i.e. at most (// — l)/2 contributing terms. 
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We now compute an upper bound on 



00 

E E 

Ti,...,T m 



where the sum ^ ranges on m-tuple of trees in T?(r) such that X^<j XX=i ^(XJe)ij = M- First note 
that for any x£R ? , we have for any p > 2: 

||x||£ < ||x||f < max (expdlxHl),^) . 
Hence the condition YliLi expdlx^Hl/C) — C ensures that for any p > 2, 



1 * 

- Viix 0)iV ii p < r 



i=l 



Therefore, 



Ti,...,T m £=l 




2 



(m) m 

e ni^i ^ u m EE( i +i-r( s )i+---+K°' 7V ( s )i md ) i ( 3 - i§ ) 



md ^ N 



2 



md 



< U m (g + E^) 




0,iV ||fc 
k 



\ k=l 

where the last inequality is valid for N > C. To see why (13.18P is true, note that the graph G 
is connected since all trees Ti, . . . ,T m have the same type i at the root. Therefore, the number of 
vertices in G is at most e(T\, . . . ,T m ) + 1 < + 1. Since all T/s have the same root which has 
type i, G has at most distinct vertices which are distinct from the one associated to the root. 
In particular, all trees Ti, ... , T m together have at most distinct types among their leaves. The 
factor q m comes from the fact that for each type j there are at most q m choices for its m marks r 
corresponding to the m trees. Now each leaf with type j will contribute a factor Jls=i \ x< j' N ( s ) 
with n s < md. 

It is now easy to conclude, since we can decompose the sum in (|3.14p in two terms, the first term 
say Si(A) consists of the contribution of the m-tuples Ti, . . . ,T m such that for all i,j, YlT=i 4>{Tt)ij € 
{0,2} while the second term denoted by S2 {A) consists of the remaining contribution. We have 
S\(A) = Si(A) and, using (|3~TTjl and (13181) . we get: 

\S 2 (A)\< Yl C* +1+ ^C'N^N-$ =o(N-v\, (3.19) 

fi<md t + 1 

which concludes the proof Proposition [6l Here we used the fact that all values fi, q, and {Cfc}^!f are 
independent of N. □ 
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We end this section by showing that the term Si(A) can be further reduced. This result will be 
useful in the sequel and we state it as the following lemma. 



Lemma 2. Recall that we denoted by Si (A) the term in the sum ^3. 1J$ , consisting of the contribution 
of the m-tuples T±,..., T m such that for all i, j, YlT=i 4>{Ti)ij £ {0, 2} . We further decompose Si (A) = 
T{A) + R{A) in two terms where the first term T(A) corresponds to the sum over trees Xi, . . . T m such 
that the resulting graph G obtained by taking the union of the Ti 's and identifying vertices v with 
the same type £(v), is a tree (each edge having multiplicity two). Then there exists K (independent 
of N) such that: 



|E [z{(ry 



-T{A) 
<K, 



|E [ 



*Mr) m ] 



< K. 



T(A) + R(A) + S 2 (A), so that thanks to (pU9]) . we need 



Proof. We have by definition E [(-z*( r ) m )] 
only to show that R(A) = O (iV" 1 / 2 ). 

For any m-tuple Ti, . . . ,T m such that for all Y1T=\ <^(^)y e {A we have with the same 
notation as above: e(Xi, . . . ,T m ) = ^. The number of vertices in G is at most 1 + e(Ti, . . . , T m ) 
with equality if and only if G is a tree (remember that G is always connected as all trees T/s share 
the same root). Hence for the cases that G is not a tree it has at most f — 1 vertices that serve as 
leaves of a tree among Ti, . . . ,T m . By the same argument as above we get 



\T(A)\ < KN2N-2=0{1) 

fj,<md t+1 

\R(A)\ < Yl KN2~ l N~2 =0{N~ 1 ] 

[i<md t + 1 



(3.20) 
(3.21) 



and the claim follows. 



□ 



3.4 Proof of Proposition [7] 

The proof follows the same approach as for Proposition [6j For notational simplicity, we consider the 
case m(r) = m, and m(s) = for all s E [q] \ r. The general case follows by the same argument. For 
y, we are using a different matrix at each iteration and we need to define a new weight associated 
to trees T € T 1 as follows: 

A( T ,t) = n < 3 - 22 ) 

(u-w)e£(T) 

In the particular case where the sequence {A*}^ is constant (i.e., equals to A), this expression 
reduces to A(T) defined previously. Similar to Lemma [T] for x, we have now 

yUjir) = E A(T,t)T(T,c,t)x(T), 
y\(r) = Yl A(T,t)T(T,c,t)x(T), 
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so that we get 



e [«M) m ] 



T u ...,T m eT?(r) 



]r(T £ ,c,t) 



E 



E 



(3.23) 



For a labeled tree T, we define (p(T) = {(^(T)?- > 0, i < j € [A 7 ], d > 1} where (p(T)fj is the number 
of occurrences in T of an edge (u —> v) with endpoints having labels £(u),£(v) 6 {2, j} and with 
generation |u| = g. In particular, we have (p(T)fj = (ft(T)ij which was defined in the proof of 
Proposition [6j Hence we have with [i = 2^i<j YlT=i 0(Xe)ij 3 



E 



[ifm = n n 



i<j'G[JV] 9 



< 



(6) 
< 



n rH 



i<jG[AT] 9 



M\ (M-l)/2 



(3.24) 



where (a) holds since {^4*}t g N is an iid sequence with the same distribution as A(N), and (b) follows 
by the same argument as in f)3. 1TH . The inequality (|3.24j) implies that the bounds (|3.19p and (|3.2ip 
are still valid with the weight of a tree given by (|3.22p (the term E [n^Li can be treated as in 

previous section). 

As in the proof of Proposition [6l we define the graph G obtained by taking the union of the T/s 
and identifying vertices v with the same type £(v). By Lemma [2j we need only to concentrate on the 
term T(A) corresponding to m-tuples T\, . . . ,T m such that each edge in G has multiplicity 2 and 
such that G is a tree. Indeed, the proposition will follow, once we prove 



T(A) = T(A), 



(3.25) 



where T{A) was defined in Lemma [2] and T(A) is the corresponding term with the weight of a tree 
given by (|3.22p . First note that for any Ti, . . . , T m such that E [Q^li A(Tp, i)\ ^ 0, we have 



E 



f[A(Tt,t) 



E 



Now suppose that we have E [H™ =1 A(T e )] / = E [U7=i A ( T h t)] ■ This can onl y happen, if an 
edge in G connecting types say i and j has multiplicity 2 but appears at different generations in the 
original trees T/s. Suppose this edge appears twice in say T\ at on the same branch and at different 
generations, i.e. there exists (a — > b) and (c — > d) € E(T\) with {£(a),£(b)} = {£(c),£(d)} = {i,j}, 
\a\ < \c\ and the edge (a —> b) is on the path that connects c,d to the root. Thanks to the non- 
backtracking property, these two edges cannot be adjacent, i.e. a / d. But then these edges create a 
cycle in G, contradiction. Suppose now that these edge appears in T\ and T2 in different generations, 
i.e. there exists (a -> b) e E(T{) and (c -> d) G E(T 2 ) with {£(a),£(b),£(c),£(d)} = {i,j} and 
\a\ < \c\. Then the same reasoning shows that they will create a cycle in G since b and d are 
connected to the roots of T\ and T2 respectively which are both identify to a single vertex in G. The 
latter argument can be used for the case where both edges belong to the same tree T\ but they lie 
in different branches. Hence we obtain again a contradiction. 
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3.5 Proof of Proposition [8] 

Proof. As in the proof of Proposition [6l we will rely on a representation of x\{r) based on labeled 
trees defined as in Section 13.21 In the present case it is however more convenient to work with trees 
from which marks have been removed, i.e. we identify any two trees in which the vertex marks are 
different but the types are the same. Notice that Eqs. (|3.1ip . (|3.12p imply 



]T A(T)T'(T,c,t)x(T), (3.26) 
z\{r) = ]T A(T)T'(T,c,t)x(T), (3.27) 

T£Uf(r) 

where T'(T, c, t) is obtained by summing T(T, c, t) over all trees T that coincide up to marks. In the 
following, with a slight abuse of notation, we will write T(T,c,t) instead of T'(T,c,t). 

In a directed labeled graph, we define a backtracking path of length 3 as a path a — > b — > c — > d 
such that £{a) = £(c) and £(b) = £(d). We define a backtracking star as a set of vertices a — > b — > c 
and a'(^ a) — > b such that £{a) = £(a') = £(c). We define B t as the set of rooted labeled trees T in 
U , that satisfy the following conditions: 

• If u — > v G E(T), then £{u) ^ £{v) and there exists in T at least one backtracking path of 
length 3 or one backtracking star. 

Then, we define B\ as the subset of trees in B t with root having type i and only one child with type 
£ with £^i. 

Lemma 3. Under the same assumptions as in Proposition^ we have 

x*(r) = zl(r)+Y,A(T)nT,t,r)x(T), 
TeB\ 

for some T(T,t,r) which is bounded uniformly as \T(T,t,r)\ < K(d,C,t). 

Proof. Following the same argument as in Lemma [H it is easy to prove by induction on t that we 
can find T(T, t, r) such that 

x\{r)= A(T)?(T,t,r)x(T), (3.28) 



with |r(T, t, r)\ < K(d, C, t). The terms t4^//(x^, t) can be handled exactly as in Lemma[TJ Concern- 
ing the terms ^4^/](x* _1 , t — 1) gjfy (x^, t), it can be interpreted as a sum on the following trees in IA: 
the type of the root is i and the root has one child with type £. This child has at most d — 1 subtrees 
in U coming from the term g^fe (x^, t) (which is a polynomial with degree at most d — 1) and one 

child say u with type i. This child u is the root of at most d subtrees in 1 coming from the term 
/](x* _1 ,t — 1). We see that the resulting tree is in U t+1 . Now to see that \T(T,t,r)\ < K(d,C,t), 
note that each polynomial f^(-,t) (resp. J^fc ( ■ , t)) has coefficients bounded by C (resp. dC) so 
that taking into account the contribution of each term in decomposition ()3.28p . we easily get 

\f(T, t + l,r)\< dC 2 \K(d, C, t) d + K(d, C, t)* -1 C, t - l) d 
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It remains to prove that T(T,t,r) agrees with the expression in LemmaHJ cf. Eq. (|3.26p . (|3.27p . 
for T G Uj(r) and is zero for trees in U \B\. The proof of this fact will proceed by induction on t. 
The cases t = 0, 1 are clear since B\ = 0. For t > 1, we define 



4i(r) = ^/;(z*^,i - 1), e|(r) = E A(T)f (T, i, r)x(T), 4 /t (r) = ^(r) + e\{r) 

so that we have by the induction hypothesis, xj, = z^_^ + i + = + d\ ,-. 
Since /~( • , t) is a polynomial, we have 

fM,t) = ^(4^,t)+E(4iW +e 5W)^)( a ^' t ) 

+ 1_1 n j dx{\y*...dz(q) n * K h 

where the last sum contains a finite number of non-zero terms. 

Multiplying by An and summing over I € [N], the first term on the right hand side gives exactly 
z' +1 (r). The second term gives: 

From now on and to lighten the notation, we omit the second argument of the functions Hence 
we have 

+ E^E^w^-U) < 3 - 29 ) 

We now show that each contribution on the right hand side (except can be written as a sum 

of terms A(T)T(T, t + l,r, rc°) over trees T E <6* +1 that we construct explicitly. 

First consider the terms of the form: Ane\{s) gjfo (z^_^). By definition e^(s) can be written as a 
sum over trees in B\ and by Lemma [H the r-th component of z|_^ can be written as a sum over trees 

in W|_^(r). Hence by the same argument as in the proof of LemmaCQ we see that A^e^s) (x|_^) 
can be written as a sum over trees with root having type i, one child say v with type I. This vertex v 
is the root of a tree in B\ (corresponding to the factor e\(s)) and a set of trees in ttj^^l), . . . jUj^^q) 

(corresponding to the factor (z^-)). This tree clearly belongs to £>* +1 . 
We now treat the terms in the first line. Again, we have 

r) f 1 r)f l 
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where g is a polynomial with either a positive power of a component of . or of d £ v Hence, we only 
need to construct trees in B\ +1 {r) corresponding to terms of the following form: for ^2 s (a s + b s ) > 1, 



i&wT (^M bs coor (^))' 



Let first consider the term: 



(s)) Cs (zj^ i (s)) da . It can be interpreted as a sum on the 
following family of trees: the type of the root is i and the root has one child with type I. This 
child has d s subtrees in ^s) and one child denoted u with type i. This child u has c s subtrees 
in Ul~^(s). Note that the only backtracking path in such a tree is the path from u to the root with 
types i,£, i. In particular such a tree does not belong to Bj(r). 

We assume now that there exists s with a s > 1. We need to interpret the multiplication by 
c ^i£ 1 (' s ) = + e i _1 ( s )- First consider the case of e* _1 (s), this corresponds to add a subtree in 

B l ~ x to the vertex u. As in previous analysis, we clearly obtain a tree in £>* +1 . The term z\~^ (s) 
corresponds to adding a child of type I to the vertex u which is the root of a subtree in Wj^(s), in 
particular we introduce a backtracking path of length 3 so that again the resulting tree is in Bj +1 . 
Similarly if b s > 1, the multiplication by cfaAs) will correspond to add a subtree to the child of the 
root, resulting in either adding a backtracking path of length 3 or adding a backtracking star. 
The last term of the form 



.4 



a 



n 



dUs) 



Qni-\ \-n q jl 



n 



.). <9z(l) n i . . .dx(q) n i 



with n\ + ■ ■ ■ + n„ > 2 can be analyzed by the same kind of argument by noticing that the factor 



Ai£zj i (s)zj i (s') corresponds to a backtracking star. 



□ 



The proof of Proposition [8] follows from the same arguments as in the proof of Proposition [6j 
Once more, for simplicity, we only consider the case m{r) = m and m{s) = for s ^ r, the general 
case of m = (m(l),m(2), . . . ,m(q)) £ N 9 being completely analogous. We represent both moments 
E[x*(r) m ] and E[z|(r) m ] using Lemma[T](in the form given in Eqs. (|3.26p . (|3.27p ) and Lemms[3j The 
expectation E[x*(r) m ] is represented as a sum over trees T t , . . . , T m G U\{r) U B\{r), while E[z*(r) m ] 
is given by a sum over trees T±,..., T m G Uj(r). In order to complete the proof we need to show that 
the contribution of terms that have at least one tree in B\{r) vanishes as N — > oo. 



The factor \[™ =1 T(T^ 
only need to prove that 



t,r) is bounded by K(d,C,t) m . which is independent of N. Hence, we 



E 



E 



E 



Ti eB| (r) Tj 67? {r 3 )UB\ {r 3 ) ,j G [2,m] 



l[A(T e )x(T e 



ON 



(3.30) 



This statement directly follows from previous analysis, since in the graph G obtained by taking the 
union of the T^'s and identifying vertices v with the same type £(v), there is at least one edge with 
multiplicity 3, due to the backtracking path of length 3 or the backtracking star in T±. So that 
previous analysis shows that the term in (|3.30p is of order O (n~ a ). □ 
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3.6 Proof of Theorem H 

Let {pN,i}N>o,i<i<N be a collection of multivariate polynomials pjy,i : — S- R with degrees bounded 
by D, and coefficients bounded in magnitude by B: 

pM*) = E «5\ ) ,^ m(1) • • • x(9)m(9) • (3 - 31) 

m(l)H \-m(q)<D 

By Propositions [6] and [8l we have, 

|E PjVj ,(x*) - E PjVj ,(xf)| < £ |E[(x^) m ] - E[(x*) m ]| < K&BNW (3.32) 

m(l)H \-m(q)<D 

whence the thesis follows. 

3.7 Proof of Theorem O 

An important simplification is provided by the following. 

Remark 2. It is sufficient to prove Theorem^ for t = s. 
(Hence, Theorem \4\ implies Theorem^) 

Proof. Indeed consider a converging sequence {(A(N), T N ,x°' N )} N >i and fix h = t - s > 0. For 
the sake of simplicity, and in view of Remark Q] we can assume J-"at to be given by the polynomial 
function g:l'xl'x[l;]xN-> R q , (x, Y, a, t) H> g(x, Y, a, t) that does not depend on the random 
variable Y. With an abuse of notation we will write g(x, a, t) in place of g(x, Y, a, t). 

We will construct a new converging sequence of instances {(A(N),J 7 n,x 0,n )}n>i with variables 



x* g ^9 anc i guch that, letting x* = (u*, v*), u*, v* G W q , the pair (u*, v*) is distributed as (x*,x, 
asymptotically as N — > oo. 

The new sequence of initial conditions is constructed as follows 

1. The initial condition is given by x^ = (0, 0). 

2. The independent randomness is given by Y(i) = x°. Notice that, for i £ Cf , we have 
Y(i) ~i.i.d. Qa and hence we let P a = Q a . 

3. The partitions , a G [k] and matrices A(N) are kept unchanged. 

4. The collection of functions in is determined by the polynomial function g : JR 2|J xR'x [A;] x 
N^M 2 *, (5i,Y,a,t)^g(%Y,a,t). Writing g(-) = [g^( • ),£ {2) ( • )], with g0-)( • ), g^{ ■ ) G R«, 
we let, for u,v G M 9 . 

Mi7 A fg(Y,a,t) ift = 0, 

^ ' (u,v),Y,a,i = < (3.33) 

[ff(u,a,i) ifi>0, 

(2)^ \ V A )9(Y,a,t) if t<h, 
9 ( >{{u,v),Y,a,t) = i (3.34) 

I g(v, a, r) it t > a. 

As a consequence of this construction, u* = x* for all % G [N], t > 1, and v* = x* _ft- for all i > h + 1. 
This completes the reduction. □ 
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As a consequence of this remark, it is sufficient to prove Theorem UJ and by Remark Q] we can 
limit ourselves to the case in which g : (x, Y, a, £) i— >■ g(x, Y, a, £) does not depend on Y and hence 
this argument will be dropped. We begin by considering the expectation of moments of x|. 

Proposition 10. Let (A(N), J-jy, x°)tv>o be a polynomial and converging sequence of AMP instances, 
and denote by {x*}t>o the corresponding AMP orbit. Then we have for any i = i(N) G , t > 1, 
m = (m(l), . . . ,m(q)) G N 9 , 



KmE[(^n=E[M 



where Z l a ~ N (0, E* ) . 



Proof. By Propositions [7] and [HI we need only to prove the statement for the AMP orbit y . We will 
indeed prove by induction on t that for any i G and any j i, 



Jim E[(y^; 



TV-s-oo 



lim „ 



E[(^) m ] 
E[(^) m ] 



in probability . 



(3.35) 
(3.36) 



For £ > 1, let 3i be the cr-algebra generated by A , . . . , A l ~ l . We will show, using the central limit 
theorem, that the random vector (y*j^(l), . . . , 2/*jL(<?)) given $ t converge in distribution to a centered 
Gaussian random vector. More precisely, by (13.8h and the induction hypothesis, the following limit 
holds in probability, 



lim E 

N->oo 



i£3(r)»&}00|% 



lim Yl E [( A ii) 2 )9r(yUi,b,t)9s(yUiAt) 



ee[N]\j 



£ CbW ab E [g r {Zl b, t)g s (Zl 6, £)] = S* +1 (r, s) 



6=1 

Since for all r G [<?] from (|3.8p we have E[y*^(r)] = 0, from the central limit theorem, it follows that 
y*^j converges to a centered Gaussian vector with covariance Since all the moments of y*j^- 

are bounded uniformly in N by Proposition [7] and Lemma [2j the induction claim, Eq. (]3.35p follows, 
for iteration £ + 1. 

In the base case £ = the same conclusion holds because 



lim E [yl+j{r)y\ _^(s) 



lim Yl E [( A °ti) 2 ] 9r(y°i^, b, 0)9s(yLi, b, 0) 



ee[N]\j 



Y,CbW ab X° b (r,s) 



6=1 



where the second identity holds by assumption. 
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El 



Next consider the induction claim Eq. (|3,36p . Recall the representation introduced in Section 



A(T,t) 



A(T,t)T(T,c,t)x(T), 

,t—\u\ 



n 



A 



(u->v)£E(T) 



Using this representation of y\_>j, y\^j it is easy to show that, for i ^ k, i, k G C a 



(3.37) 



for some function e(N) — > as N — > oo ar m,C,d,t fixed. Indeed, the above expectations can 
be represented as sums over m = m(l) + m(2) + • • • + m(</) trees Ti, . . . ,T m £ Tjt+j an d m trees 
T{ , . . . , € • Let G be the simple graph obtained by identifying vertices of the same type in 

rp rp rpl rpl 

J li • • • )- I m) i lv ) i m' 

By Lemma [2] and the argument in the proof of Proposition [6l all the terms in which G has 
cycles, or an edge of G correspond to more than 2 edges in the union of T\,... ,T m , T{, ... , T' m 
add up to a vanishing contribution in the N — > oo limit. Further, all the terms in which G is the 
union of two disconnected components (one containing i, and the other containing k) are identical 



in E 



and E 



E 



(yjLj) 



and hence cancel out. We are therefore left 



with the sum over trees T\, . . . , T m , Tp...,T4 such that G is itself a connected tree, with edges 
covered exactly twice. Assume, to be definite, that G has fi vertices and hence /i — 1 edges. The 
weight of such a term is bounded by 



KE \ f[ A(Ti,t) J] A(Tt,t) \ < KN~ 



.4 = 1 



i=l 



On the other hand, the number of such terms is bounded by K N^~ 2 (because the type has to be 
assigned to fi vertices, but 2 of these are fixed to i and k), and hence the overall contribution of these 
terms vanishes as well. 

From Eq. ()3.37p and using the fact that E[(y*_ >J ) 2m ] < K (because of Lemma[2]and Proposition 
[7J, we have 



Equation (13.36|) follows for iteration t + 1 by applying Chebyshev inequality to the sequence 



\r<N\ L^i Wi-i-Jj 



N>0 



and using (|3.35p . 



□ 
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We are now ready to prove Theorem [5] in the case in which ip : M q — > R is a polynomial. 

Proposition 11. Let (A(N), Fn, x°)at>o be a polynomial and converging sequence of AMP in- 
stances, and denote by {x*}t>o the corresponding AMP orbit. Then we have for any t > 1, m = 
(m(l), . . . , m(q)) G N 9 , 

1 a 1 iec* 

Proof. In order to prove (I3.38p . we fix t > 1 and a G [A;], and construct a modified sequence of 
AMP instances as follows. The new sequence has N' = 2N and k' = k + 1. The new partition 
of the variable indices {l,...,iV} is the same as in the original instances, with the addition of 
Cg +1 = {N + 1, . . . , 2N = N'}. Further we set, for ip : W -> R a polynomial, 

1. For i,j < N: A\ - = A {j and when i > N or j > N define A\, ~ N(0, 1/iV) independently. 

2. g'(x,b,t') = g(x,b,f) for 6 G [fc], i' < t - 1; g'(x,b,t) = for 6 G [fc] \ a; ^(x,a,t) = </>(x), 
g£(x, a, i) = 0, for r > 2; 5 '(x, fc + 1, i') = for all t'. 

The definition of </(x, a, i') for i' > t is irrelevant for our purposes. 

Since </(x, A; + 1, t') = for all i', the orbit (x* : i < N, t' < t) is not affected by the new variables. 
Further, by the general AMP equation (|1.6|) . we have, for % G C 



N 

fc+1 



\l)= AtM^j)- (3-39) 



Notice that the {Aij}j £C N in this equation are independent of x*-. Hence 

E{^ +1 (1) 4 } = Yl E {^ii^i2^i 3 ^}E{^(x^)^(x* 2 )^(x* 3 )^(x*. 4 )} (3.40) 

= ^2 E IE{^(x* 1 )V(x* 2 ) 2 }. (3.41) 

On the other hand, using Proposition 1101 (once for iteration t + 1 and i G Cj? +1 , and another time for 
iteration t and i G C„ ) we get 

lim E{*< +1 (1) 4 } = E{(ZW(l)) 4 } = 3(^(M)f = 3c^) 2 } 2 , t G Cf +1 ,(3.42) 
lim E{^(x*) 2 } = EMZ*) 2 }, ieCf, (3.43) 

A^— >oo 

where ~ N(0, £*). Comparing these equations with Eq. (|3.41[) we conclude that 

JtF E %(4)V(4J 2 }=(mJE E ^) 2 ]1 • (3-44) 
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Equivalently 

A m oo Var {^E^) 2 }=0- (3^) 
1 a 1 ieC" 

Taking </?(x) = x k , we obtain Eq. (|3.38p for m even. In order to establish Eq. (|3.38p for general m we 
take, for instance, </?( x ) = 1 + £xm an d use the fact that the limit must vanish for all e. □ 

At this point we can prove Theorem 

Proof of Theorem^ By Remark Q] and Remark [2j we reduced ourselves to the case t = s, and 
Y(i) = (equivalently, Y(i), is absent). 

Consider the empirical measure on M q given by 



I 1 a 



W\ ^ 5 *i 



ieC, 



Proposition \W\ shows the convergence of expected the moments of fi^ to moments that determine 
the Gaussian distribution. Proposition [11] combined with Chebyshev inequality implies 



t\ m 1 



Jim ^((^r)=E[(Zi) 

in probability. The proof follows using the relation between convergence in probability and conver- 
gence almost sure along subsequences, together with the moment method. □ 



4 Non-symmetric matrices 

In this section we consider a slightly different setting that turns out to be a special case of the one 
introduced in Section [l~3l 

Definition 12. A converging sequence of (polynomial) bipartite AMP instances {(A(n), f, h, £ ' n )} n >i 
is defined by giving for each n: 

1. A matrix A{n) E flj mxn with m = m(n) such that limn-^ m(n)/n = 6 > 0. Further, A{n) = 
(Aij)i< m j< n is a matrix with the entries Ay independent subgaussian random variables with 
common scale factor C/n and first two moments K{Aij} = 0, K{A?j} = 1/m. 

2. Two functions / : I' x M« x N -> W, and h : R q x W x N -)■ R q such that, for each t > 0, 
/( • , • , t) and h{ ■ , • , t) are polynomials. 

3. An initial condition x 0,n = (x°, . . . , x°) E V q , n ~ (M 9 )™, with x? E W 1 , such that, in probability, 

n 

£exp{||xf'l!/C} <nC, (4.1) 

1 U 

lim — -^/(x°,y(i),0)/(xO,y(i),0) T = H°. (4.2) 

v ' i=l 
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4- Two collections of i.i.d. random variables (Y(i),i G [re]) and (W(j),j G [m]) with Y(i) ~j.j.d. Q 
and W(?) P- Here Q and P are finite mixture of Gaussians on M. q . 

Throughout this section, we will refer to non-bipartite AMP instances as per Definition as to 
symmetric instances. With these ingredients, we define the AMP orbit as follows. 



Definition 13. The approximate message passing orbit corresponding to the bipartite instance 

) z l G V 

q,n c "q,\ 



(A, f,h,x°) is the sequence of vectors £*}t>o> x t G V qjn z t G V qm defined as follows, for t>0, 



z t = Af(x t ,Y;t)-B t h{z t - 1 ,W;t-l), (4.3) 
x t+1 = A r h(z\W;t)-D t f(x t ,Y;t), (4.4) 

where /(■■■), h{ ■ ■ ■ ) are applied componentwise (see below for an explicit formulation). Here B t : 
Vq,m. — > Vq,m is the linear operator defined by letting, for v' = B t v, and any j G [m], 

V J= ( E^|(4,mi)|vi. (4.5) 
Analogously Dt : V q , n Vqn *s the linear operator defined by letting, for v' = DfV, and any j G [n], 

<= f E^f^ z ^«;*)U. (4.6) 

\ie[m] J 
For the sake of clarity, it is useful to rewrite the iteration (|4,3p . ()4.4p explicitly, by components: 

z* = ^^/(^.,y(j);t)- Y. A )k^A,Y{%t)K* t -\W{i)-t-l) foralHGH, 



j£[n] fce[n] 

dz 



ah 

xf 1 = AijHylW^t) - A% — {zlW{l);t)f{x%Y{j);t) foralljG[re]. 



i£[m] i€[m] 

We will state and prove a state evolution result that is analogous to Theorem [5] for the present case. 
Since the proof is by reduction to the symmetric case, the same argument also implies a universality 
statement of the type of Theorem [3j However, we will not state explicitly any universality statement 
in this case. We begin by introducing the appropriate state evolution recursion. In analogy with 
Eq. (jl.lOp . we introduce two sequences of positive semidefinite matrices {£*} f >o, {r}}t>o by letting 
H° be given as per Eq. (|4.2j) and defining, for all t > 1, 



£' =m{h(Z t - 1 ,W,t- l)h(Z^\W,t- 1) T | , Z 1 - 1 ~ N(0,H* _1 ), W~P, (4.7) 

iE{/(X',y,t)/(A*,y,t) T } , X 1 ~N(0,£*)> Y~Q. (4.8) 



a 5 

We also define a two-times recursion analogous to Eqs. (|3.2p . (|3.3p . Namely, we introduce the 
boundary condition 

A ° " f % to) , &° = (% So) , ^ = , (4-9) 
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with S* defined per Eq. (|4.7p . (|4.8p . For any s,t > 1, we set recursively 

S^^El^x^x^I^} , (4.10) 

Z t -i, s -i = [h(Z^\W, t - 1), h(Z s ~\ W, 8-1)], (4.11) 

S^ s = E{x tjS Xl] , (4.12) 

[/(x*,y,t),/(x',y,s)]. (4.13) 



(Recall that [u, v] denotes the column vector obtained by concatenating u and v.) 

Theorem 6. Let {(A(n), f, h, x°' n )} n >i be a polynomial and converging sequence of bipartite AMP 
instances, and denote by {x*,z*} t >o the corresponding AMP orbit. 

Fix s, t > 1. If s t, further assume that the initial condition x 0,n is obtained by letting x^' n ~ R 
independent and identically distributed, with R a finite mixture of Gaussians. Then, for each locally 
Lipschitz function ip : W x R<? x R<? -> R such that \ip(x, x', y)\ < K(l + \\y\\l + ||x||^ + ||x / ||^) K J we 
have, in probability, 

¥,[^{X\X S ,Y)}, (4.14) 
^{Z\Z\W)], (4.15) 

je[m\ 

where (X l ,X s ) ~ N(0,E' ,s ) is independent ofY^Q, and (Z t ,Z s ) ~ N(0, H' ,s ) is independent of 
W ~ P. 

Proof. The proof follows by constructing a suitable polynomial and converging sequence of symmetric 
instances, recognizing that a suitable subset of the resulting orbit corresponds to the orbit 
of interest, and applying Theorem [5j 

Specifically, given a converging sequence of bipartite instances (A(n), f, h, x°' n ), we construct a 
symmetric instance (A S (N), g, x1' N ) with (below we use the subscript s to refer to the symmetric 
instance): 

1. The symmetric instance has dimensions N = n + m and q s = q, q s = q. 

2. We partition the index set in k = 2 subsets: [N] = Ci U C^, with = {1, . . . , m} and 
C2 = {m + 1, . . . , m + n}. In particular c\ = 6/(1 + 5) and C2 = 1/(1 + S). 

3. The symmetric random matrix A' is given by 

A - ( A \ 
As ~ [a j J ■ 

In particular W n = W22 = and W12 = W21 = (1 + S)/8. 

4. The vertex labels are Y s (i) = W(i) for i < m and Y s (i) = Y(i — m) for i > m. In particular, 
these are independent random variables with distribution Y s (i) ~ P\ = Q if i 6 and 
n(i)~i*2 = Pifi€Cf. 



hm -^7/,(x*,x*,Y(j)) 



n— >oo 77, 



Af^-oo m(n) J J 
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5. The initial condition is given by xjj'^ = for i 6 and = for i € C^. 

6. Finally, for any xeR«,yei«,t> 0, we let 

g(x,Y,a = l,2t) = /(x,Y,t), (4.16) 
5 (x,Y,a = 2,2t + l) = fc(x,Y,t),. (4.17) 

The definition of g(x, Y, a = 1, 2f + 1) and g(x, Y, a = 2, 2t) is irrelevant for our purposes. 

The proof is concluded by recognizing that, for all t > 0, 

x^ 1 = 4, for i e Gf, 

□ 



We finish this section with a lemma that establishes continuity of the AMP trajectories with 
respect to Gaussian perturbations of the matrix A. This fact will be used in the next section. (Notice 
that an analogous Lemma holds by the same argument for converging, non-bipartite, instances.) 

Lemma 4. Let {(A(n), /, h, x°' n )} n >i be a polynomial converging sequence of bipartite AMP in- 
stances and denote by {x* , z t }t>o the corresponding AMP orbit. For each n, let G(n) £ jjra(n)xn 
be a random matrix with i.i.d. entries G(n)ij ~ N(0, l/m(n)), independent of A(n). Consider the 
perturbed sequence {(A(n) = A(n) + v G(n), f, h, x°' n )} n >i, with v € M + and denote by {x l , z t }t>o 
the corresponding AMP orbit. Then for any t there exists a constant K independent of n such that 

E{\\4-^g}<K(^ + n-^), n\\<-^\l}<KU + n-^ 



Proof. Consider the difference [x'(r) — x*(r)]. By the tree representation in Section [3.21 and Lemma 
[3l this difference can be written as a polynomial in A and G whereby each monomial has the form 

T(T,t)x(T){ J] A e{u)e{v) - 11 A e{u)e{v) y (4.18) 

(u->v)eE(T) (u^v)eE(T) 

Enumerating the edges in T as (u\,Vi),. . . , iu}.,v^) the quantity in parenthesis reads 

k i— 1 k 

yi n m^avj) • v G i(ui),i(vi) ■ n M^Avj) ■ ( 4 - 19 ) 

i=l j=l j=i+l 

In other words, the sum over trees T is replaced by a sum over trees with one distinguished edge, 
and the edge carries weight u Gm u a m v a- The expectation E{||x* — x*|||} is given by a sum over 
pairs of such marked trees. Using the fact that the entries of the matrix A(n) are still independent 
subgaussian with scale factor C / (n + v 2 Cm{n)) < G'/n, it is easy to see that the argument in Lemma 
[2] and ([3.30P are still valid. Hence, up to errors bounded by Kn~ 1 / 2 the only terms that contribute 
to this sum are those over pair of trees such that the graph G obtained by identifying vertices of the 
same type has only double edges. In particular for the distinguished edge, we can use the following 
upper bound instead of ()3. 15[) : E [|i/Gjj| 2 ] = < K^- and this yields a factor v 2 (by the same 
argument as in the proof of Lemma [2] to get (|3.20p ). □ 
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5 Proof of universality of polytope neighborliness 



In this section we prove Theorem [2j deferring several technical steps to the Appendix. 

Hypothesis 1 Throughout this section {A(n)} n >o is a sequence of random matrices whereby A(n) £ 
R mxn has independent entries that satisfy K{A(n)ij} = 0, K{A(n)^} = 1/m and are subgaus- 
sian with scale factor s/m, with s independent of m, n. 

Notice that these matrices differ by a factor 1 /y/rn from the matrices in the statement of Theorem 
[2j Since neighborliness is invariant under scale transformations, this change is immaterial. 

The approach we will follow is based on the equivalence between weak neighborliness and com- 
pressed sensing reconstruction developed in |Don05b[ IDon05al IDT05bl IDT05aj . Within compressed 
sensing, one considers the problem of reconstructing a vector xo G M n from a vector of linear 'obser- 
vations' y = Axq with y G M m and m < n. The measurement matrix A E ]jj mxn i s assumed to be 
known. An interesting approach towards reconstructing xq from the linear observations y consists 
in solving a convex program: 

x(y) = argmin{||x||i such that iff, y = Ax , j . (5.1) 

Hence one says that i\ minimization succeeds if the above argmin is uniquely defined and x(y) = xq. 
Remarkably, this event only depends on the support of xn, supp(xo) = {i E [n] : xo,i 7^ 0} [D on05b| . 
This motivates the following abuse of terminology. We say that, for a given matrix A, l\ minimization 
succeeds for a fraction / of vectors xq withE) ||xo||o < k if it does succeed for at least /(?) choices 
of supp(xo) out of the (Tj possible ones. Analogously, that l\ minimization fails for a fraction / of 
vectors xo if it does succeed at most for (1 — /)(£) choices of supp(xo). 

Success of i\ minimization turns out to be intimately related to the neighborliness properties of 
the polytope AC n . 

Theorem 7 (Donoho, 2005). Fix 5 G (0, 1). For each n G N, let m(n) = [nd\ and A(n) E R m M x ™ 
be a random matrix. Then the sequence {A(n)C n } n >o has weak neighborliness p in probability if and 
only if the following happens: 

1. For any /?_ < p, there exists e n \. such that, for a fraction larger than (1 — e n ) of vectors xq 
with ||xo||o = m{n) /?_ the l\ minimization succeeds with high probability (with respect to the 
choice of the random matrix A(n) ). 

2. Viceversa, for any p+ > p, there exists e n \. such that, for a fraction larger than (1 — e n ) of 
vectors xq with ||xo||o = m(n) p+ the t\ minimization fails with high probability (with respect 
to the choice of the random matrix A(n) ). 

This is indeed a rephrasing of Theorem 2 in [Don05b . 

In view of this result, Theorem [2] follows from the following result on compressed sensing with 
random sensing matrices. 

3 As customary in this domain, we denote by ||v||o the number of non-zero entries in v € (which of course is not 
a norm). 
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Theorem 8. Fix 5 G (0,1). For each n G N, let m(n) = [nS\ and define A(n) G ]R m ( n ) xn to be 
a random matrix with independent subgaussian entries, with mean 0, variance 1/m, and common 
scale factor s/m. Further assume Aij(n) = Aij(n) + uq Gij{n) where v$ > is independent of n and 
{Cij(n)}j g [ m j ,je[n] i- s a collection of i.i.d. N(0, 1/m) random variables independent of A(n). 
Consider either of the following two cases: 

1. The matrix A(n) has i.i.d. entries and {xo( n )}n>i is any fixed sequence of vectors with 
lim^oo \\x (n)\\ /m(n) = p. 

2. The matrix A(n) has independent but not identically distributed entries. The vectors xq(u) 
have i.i.d. entries independent of A(n), with ¥{xo^(n) ^ 0} = p8. 

Then the following holds. If p < p*(5) then i\ minimization succeeds with high probability. Viceversa, 
if p > p*(5), then t\ minimization fails with high probability. (Here probability is with respect to the 
realization of the random matrix A{n) and, eventually, xo(n).) 

The rest of this section is devoted to the proof of Theorem [8j Indeed, as shown below, this 
immediately implies Theorem [2j 

Proof of Theorem^ Take xq(h) to be a sequence of independent vectors with independent entries 
such that Fp{xo(n)i = 1} = p5 and F p {xo(n)i = 0} = 1 — pb. Then, by the law of large numbers we 
have linin^oo ||a;o(n)||o/n2(?i) = p almost surely. Let A{n) G jj m ( n ) xn De a matrix with i.i.d. entries 
as per Hypothesis 1 above, with m(n) = [^^J and y{n) = A(n)xo(n). Applying Theorem [8j we have, 
for any p_ < p*(<5) and p + > p*(5) 

lim F p _{x(y(n)) = x (n)} = 1, (5.2) 
lim F p+ {x{y(n)) = x (n)} =0, (5.3) 

where IPp ± { • } denotes probability with respect to the law just described when p = p±. Let 
V(p;m,n) be the fraction of vectors xo with ||xo|| = \mp\ on which l\ reconstruction succeeds. 
Since in Eqs. (I5.2p . (|5.3h . support of xo(n) is uniformly random given its size, and the probability of 
success is monotone decreasing in the support size |Don05bj . the ab ove equations imply 

lim E{ V (p- ; m, n) } = 1 , (5.4) 

n—i>oo 

lim E\V(p + ;m, n)\ = , (5.5) 

Using Markov inequality, Eqs. (]5.4p . (15.51) coincide (respectively) with assumptions 1 and 2 in The- 
orem [71 The claim follows by applying this theorem. □ 

Let us now turn to the proof of Theorem [8j The following Lemma provides a useful sufficient 
condition for successful reconstruction. Here and below, for a convex function F : M g — > M, dF(x) 
denotes the subgradient of F at i £ R 9 . In particular <9||x||i denotes the subgradient of the i\ norm 
at x. Further, for R C [n], An denotes the submatrix of A formed by columns with index in R. 
The singular values of a matrix M G R dlXd2 are denoted by a max (M) = cri(M) > a 2 {M) > ■ ■ ■ > 

Lemma 5. For any Ci, 02,03 > 0, there exists £o( c i) c 2 5 C3) > such that the following happens. If 
x Q G R n , A G W mxn , y = Axq G M. m , are such that 
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1. There exists v € 9||xo||i and z £ W 71 with v = A T z+w and \\w\\2 < \/ne, with e < £o( c i, C2, C3). 

For c £ (0,1), tei 5(c) = {i £ [n] : > 1 — c}. T/ien, /or any S' C [re], |S"| < cin, i/ie 
minimum singular value of Asr Cl )uS' satisfies o" m in(^4s( Cl )uS') > c 2- 

5. T/ie maximum singular value of A satisfies C3 1 < o" max (A) 2 < C3. 

T/ien xo is the unique minimizer of \\x\\i over x £ M n such that y = Ax. 

The proof of this lemma is deferred to Appendix [Bj 

The proof of Theorem [8] consists in two parts. For p > p*(5), we shall exhibit a vector x with 
||x||i < ||xo||i and y = Ax. For p < p*(5) we will show that assumptions of Lemma [5] hold. In 
particular, we will construct a subgradient v as per assumption [TJ In both tasks, we will use an 
iterative message passing algorithm analogous to the one in Section HJ The algorithm is defined by 
the following recursion initialized with x° = 0: 

= 77(2;* + A T z t ;aa t ), (5.6) 
z l = y - Ax 1 + b t z l ~ l , (5.7) 

where n(u;9) = sign(n) (u — 9)+, a is a non-negative constant, and bt is a diagonal matrix whose 
precise definition is immaterial here and will be given in the proof of Proposition [T3] below. Notice 
two important differences with respect to the treatment in Section 

• The iteration in Eqs. (]5.6p . (I5.7P does not take immediately the form in Eqs. (14. 3p . (14. 4p . For 
instance the nonlinear mapping n( ■ ; ao~t) is applied after multiplication by A T . This mismatch 
can be resolved by a simple change of variables. 

• The nonlinear mapping rj(-;ao~t) is not a polynomial. This point will be addressed by con- 
structing suitable polynomial approximations of i]. 

We refer to Appendix |A] for further details. 

For t > 0, at is defined by the one-dimensional recursion 

a 2 t+l = ^E{[r,(X + a t Z; aa t ) - X] 2 } , (5.8) 

where expectation is with respect to the independent random variables Z ~ N(0, 1), X ~ p Xi and 
a 2 = K{X 2 }/6. 

Proposition 14. Let {(xq(u), A(n), y(n))} n >o be a sequence of triples with A(n) random as per 
Hypothesis 1, {xo t i(n) : i G [n]} independent and identically distributed with xq^(ii) ~ px a finite 
mixture of Gaussians on R, and y(n) = A(n)xo(n). 

Then, for each n there exist a sequence of vectors {x*(n), z t (n)}t>o, with x t (n) = x l £ M n , 
z l {n) = z l € M m , such that the following happens for every t. 

1. There exists a diagonal matrix bt = bt(n) such that 

z t = y-Ax l + b t z l ~ x , (5.9) 
lim max(b t )ij = lim m.m(b t )n = \w{\X + a t -\Z\ > aa t -i] . (5.10) 

n->oojg[ m ] n->ooj e [ m ] o 

where the limit holds in probability. 
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2. In probability 



lim - \\x t+1 -riix* + A T z t ;aa t )\\l = 0. {5.11) 

n— >oo n 



3. For any locally Lipschitz function ifi : R x R — ► R, y)| < C(l + x 2 + y 2 ), in probability 

1 n 

lim - V ^(x ,i,xl + {A T z t ) l ) = E i/,(X, X + a t Z) . (5.12) 

r). — Vrvi r) * 



i=l 



^. There exist a two functions o(a; c) and o(a, 6;c), mt/i o(a; c) — > 0, o(a,b;c) — > as c — )• at 
a,b fixed, such that the following holds. Assume Aij{n) = Aij(n) + vGij(n) where v > is 
independent of n and {Gjj(n)}j g [ m ] j g [ n ] is a collection of i.i.d. N(0, 1/m) random variables 
independent of A{n). Then there exists a sequence of vectors {x*,z*}t>o that is independent of 
G such that, for any t > 0, 

-^2E{((x t + A T z t ) i -(x t + A T z t ) l ) 2 } < o(t-v)+o{t,v;rr 1 ), (5.13) 

n i=l 

. m 

-^{(zj-zf) 2 } < o(t;v)+o&v;n- x ). (5.14) 
The proof is deferred to Appendix lAl 

We also need a generalization of the last proposition for functions of the estimates x*, x s at two 
distinct iteration numbers t 7^ s. To this objective, we introduce the generalization of the state 
evolution equation (15.8p . Namely, we define {R s ,t}s,t>o recursively for all s,t > by letting 

R s+W = ^E{[ V (X + Z s ;aa s ) - X][ V (X + Z t ;aa t ) - X]} . (5.15) 

Here the expectation is with respect to X ~ px and the independent Gaussian vector [Z s , Zt] with 
zero mean and covariance given by E{Z 2 } = i? SjS , E{Z 2 } = Rn and ¥,{ZtZ s } = Rt lS - The boundary 
condition is fixed by letting /?o,o = E{X 2 }/<5 and defining, for each t > 0, 

i?o,t+i = ^E{h(I + 2 ( ; a( r t )-I][-I]}, (5.16) 

with Z t ~ N(0, i2*,t). This uniquely determine the doubly infinite array {Rt,s}t,s>o- Notice in 
particular that Rt : t = o~\ for all t > 0. (This is easily checked by induction over t). 

Proposition 15. Under the assumptions of Proposition 14 the sequence {a;*(n), (n)}t>o constructed 
there further satisfies the following. For any fixed t, s > 0, and any Lipschitz continuous functions 
V>:RxRxR->R, 0:RxR-^R, in probability 



n— >oo n 

i=l 



lim - Y ij(x 0ti ,x s i +(A T z s ) i ,x t i + (A T z t ) l ) = E^j(X,X + Z S ,X + Z t ) , (5.17) 

1 n 

lim _ V«£«,4) = E0(Z s ,Z t ) , (5.18) 



n->oo m 

i=l 



where expectation is with respect to X ~ independent Gaussian vector (Z s ,Zt) with zero 

mean and covariance given by E{Z|} = R s , s , ^{Z 2 } = Rt,t an d ^{ZtZ s } = Rt,s- 
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The proof of this proposition is in Appendix [XI 

Finally, we need some analytical estimates on the recursions (15.8f) and (|5.15p . Part of these 
estimates were already proved in [DMM09, DMM11, BM12], but we reproduce them here for the 
reader's convenience. Proofs of the others are provided in Appendix [Cj 

Lemma 6. Letpx be a probability measure on the real line such that px({0}) = 1— e andK px {X 2 } < 
oo, fix 5 G (0, 1) and set p = 5e. For this choice of parameters, consider the sequences {of }t>o, 
{Rs,t}s,t>o defined as per Eqs. fPp . i5.15\) . 
If p < p*(5) then 

(al) There exists a\{e,5), a2(e,<5), a*(e) with < a\{e,5) < a*(e) < ct2(s, 5) < oo, and u*(e,5) G 
(0, 1) such that the following happens. For each a G (ai, 02), of = B + o t (l)) as t — > 00, 
with oj G (0, 1). 

Further, for each uj G [a;*(e, 6"), 1) i/iere exists a_ G (ai,a*] and a + G [a*, 02) (distinct as long 
as uj > lu*) such that, letting a G {a_,a + }, erf = + o 4 (l)). 

Finally, for all a G [a*, a 2)> Ziaue e + 2(1 — e)3>(— a) < 5. 
(o2) For any a G [a*(e), a 2 (e, 6)), we have lim^oo i2 t ,t-i/(c r tO"t_i) = 1. 

(a3) Assume px to be such that max(px((0, a)),px (( — a ) 0))) < Ba b for some B,b > (in particular 
this is the case if px has an atom at and is absolutely continuous in a neighborhood ofO). 
Fixing again a G [a*(e), 02(6, <5)), and c G M+, 

lim sup P{|X + Z S | > ca s ; \X + Z t \ < ca t ] =0, (5.19) 

*o-K»t,s>t 

where (Z s , Zt) is a gaussian vector with E{Z 2 } = of, E{Zf } = of, K{Z s Zf} = R s ,t- 
Viceversa, if p > p*{5) , then there exists ao(5,px) > a m in(o~) > such that 

(61) For any a > a m - m (S), we have lirn^oo of = of > and, for a > uq, ]xmt-^ ao [Rt ) t — %Rt,t-i + 

Rt-i,t-i] = 0. 

(62) Letting a = olq{5,px), we have ¥{\X + o*Z\ > aa*} = 5. 

(63) Consider the probability distribution px = (1 — e)o~o + £7 with j(dx) = exp(— x 2 /2)/\/2vr dx 
the standard Gaussian measure. Then, setting a = ao(5,px), we have lim^oo E{|n(X + 
atZ;aat)\} < E{|X|}, where Z ~ N(0, 1) independent of X. 

We are now in position to prove Theorem El For greater convenience of the reader, we distinguish 
the cases p < p*(S) and p > p*(5). Before considering these cases, we will establish some common 
simplifications. 

5.1 Proof of Theorem [8], common simplifications 

Consider first case 1. By exchangeability of the columns of A(n), it is sufficient to prove the claim 
for the sequence of random vectors obtained by permuting the entries of xo(n) uniformly at random. 
Hence xq(h) is a vector with a uniformly random support supp(xo(n)) = S n , with deterministic size 
\S n \ such that \S n \/n — > e. Further, the success of l\ minimization is an event that is monotone 
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decreasing in the support supp(xo(n)) [Don05b] . Therefore we can replace the deterministic support 
size, with a random size \S n \ ~ Binom(n,e) (which concentrates tightly around ne). 

Finally, since success of t\ minimization only depends on the support of Xo(n) |Don05b] . we can 
replace the non-zero entries by arbitrary values. We will take advantage of this fact and assume 
that all the non-zero entries of xo(n) are i.i.d. N(0, 1). We conclude that it is sufficient to prove 
that l\ minimization succeeds/fails with high probability if the vectors Xo(n) have i.i.d. entries with 
distribution px = (1 — e)5o + £7) where 7(dx) = exp(— x 2 /2)/\/2n dx. 

Consider next case 2, in which the entries of xo(n) are i.i.d. with W{xo t i(n) ^ 0} = p5 = e. Again, 
exploiting the fact that the success of t\ minimization depends only on the support of xo(n), we can 
assume that its entries have common distribution px = (1 — e)5o + e-y. 

Summarizing this discussion, in order to prove the Theorem both in case 1 and case 2, it will be 
sufficient to do so for the following setting 

Remark 3. In the proof of Theorem^ we can assume the vectors xo(n) to be random with i.i.d. 
entries with common distribution px = (1 — e)5o + e 7> an d the matrices A(n). 



5.2 Proof of Theorem [H p < p*(5) 

Fix p < p*(5). We will prove that the hypotheses [TJ [21 [3] of Lemma [5] hold with high probability for 
fixed ci, C2, C3 > 0, and e arbitrarily small. This implies the claim (i.e. that l\ minimization succeeds) 
by applying the Lemma. Notice that hypothesis [3] holds with high probability for some C3 = 03(6) 
by classical estimates on the extreme eigenvalues of sample covariance matrices |BS98|, IBS05] . 

We next consider hypothesis [1] of LemmaEJ In order to construct the subgradient v used there, we 
consider the sequence of vectors {x t , £*}t>o defined by as per PropositiondU We fix a E (a±(e), 012(e)) 
as per Lemma[6l(a) so that of = Au t (l + o(l)) with u € (0, 1) to be chosen close enough to 1. Also, 
we introduce the notation 6 t = aat- We let v t € W 1 be defined by 

t Jsign(x ,i) if t € 5, 

v i = \ 1 / 41 <T f-i ~t\ 1 - (5.20) 
I [x l 1 + A' z — x ) . otherwise, 

s* = T i (x t - 1 +A r ^- 1 ;e t -i). (5.21) 



Notice that, by definition of the function r}{ ■ ; • ) we have \x l ~ l - (A T z l ~ l )i — xA < 0t—i, and hence 
v t £ d\\xo\\i. We can write 





1 


V* = 






t -i 


e = 


1 




t -i 




1 




Ot-i 




fsig 
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A T z l +i t + p t + C\ (5.22) 
(a;*- 1 + A T z t ~ 1 - x l - A T z r ) , (5.23) 
(x t - x*) , (5.24) 



9^7 (x'" 1 +A T z t ~ 1 -x t ) i itieS, 

otherwise. 



(5.25) 



This part of the proof is completed by showing that there exists h(t) with lim^oo h(t) = such 
that, for each t, with high probability we have H^lUA 1 < (1 — \fui) 2 /a 2 + h(t), WftWl/n < h(t), and 
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He ||!/ n ^ h(t). Indeed, if this is true, we can then choose t sufficiently large and a G (a*(e), 02(6, 5)) 
so that ||^* + /3* + C*||| ^ s smai l enough as to satisfy the condition [1] of Lemma [5j 

First consider £*. Applying Proposition [TBI to ip(x,yi,y2) = (yi — y2) 2 , we have, in probability 



lim -||£ 

n— >oo n 



e 11 2 
2 



lim 

n->oo na*cr. 
1 



2^2 
t-l 



i^ + aV-x*- 1 -^- 1 !^ 



2^-2 



arc 



t,t 



2i? 



t,t-l 



t-l,t-l 



t-l 



2^2 



1 



■ [a\ - 2a t a t _i + of^] + 2 



t-i 



gt 



R 



t,t-i 



a- 



1 - V^) 2 + h(t) . 



Here the last equality follows from the fact that of /of_i — )• w by Lemma[6j(al) and Rt,t-i/{^t^t—i) 
1 by Lemma [6Ka2). This implies the claim for £*. 
Next, consider /3*. By Proposition 11412 



lim -\\x t -x t \\l = lim - lb* - ^(x* -1 + A T z t ~ 1 ; atr t _i)||| = , 

n—s-oo 71 n— >oo fi 

and hence ||/3*||2/ n < ^(*) with high probability for any hit) > 
Finally consider £*, and define 9) = y — r/(y; 6*). We have 

'+1 fory>0, 
{ y/# for - < y < 9, 
— 1 for y < —9. 

Using Proposition Q313, we can show that 

lim 1||C*||! = E{[sign(X) - R(X + a^Z; aa t ^)] 2 l x ^ Q } . 

n— >oo n 



(5.26) 



(5.27) 



Notice that this apparently requires applying Proposition Q3] to the function ip(x,y) = [sign(x) — 
R(y;9)] 2 l x ^o which is non-Lipschitz in x. However we can define a Lipschitz approximation, with 
parameter r > 0: 



A(x,y) 



[x/r — R(y; 9)] 2 \x\/r for|x|<?', 
[1 - R(y; 9)} for \x\ > r . 



(5.28) 



Notice that ip r is bounded and Lipschitz continuous. We further have \ip r (x,y) — tp(x,y)\ < 41(x 7^ 
0; | a; | < r), whence 



lim sup 

n— >oo 



1 1 n 4 n 

-||C t |||--y"V'r.(xo,i,^- 1 +yl T z'- 1 ) <lim sup -y)l(xo,i^0;|x Ol i| < r) < 8r(5.29) 



i=i 



i=l 



The last inequality holds almost surely by the law of large numbers using 7Q— r, r]) < 2r. Analogously 
Ei^(X,X + a t -iZ) - Eij) r (X,X + a t -iZ) < 4P(A / 0; | A| < r) < 8r . (5.30) 
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Hence the claim (|5.27p follows by applying Proposition [TH3 to ip r (x,y), using Eqs. (|5,29p . (|5.30[) . 
and letting r — > 0. 

We conclude by noting that the right-hand side of Eq. (|5.27j) converges to as t — > oo by 
dominated convergence, since at — > 0. Therefore 

n->oo n Z 

this completes our proof of assumption Q] of Lemma 

We finally consider hypothesis [21 Let St(c) be defined as there, for the subgradient v t , namely 

S t (c) = {i€ [n] : |t;|| > 1 - c} 

= 5u{iG[n]\S : (x*" 1 + AV^I > (1 - c)0 t _i} . 

Recall that by assumption A^j = Aij + vGij whereby Gij ~ N(0, 1/m) and (eventually redefining Aij) 
we can freely choose u G [0, uq\. Let {x*, 5*}t>o be a sequence of vectors defined as per Proposition 
[T4l4. and define tf* as v , but replacing x 4 , z , A by x , 5 , ^4 



1 i AT„t-l_-£t 

9t-i 



e") . otherwise, 



2* = ^(z*- 1 +^ l ^- i ;fl t _ 1 ). (5.32) 



We further define 



S t (c) = {ie [n] : g| > 1 - c} 

= Su{ie[n]\S : Ix*" 1 + i T 5*- 1 | > (1 - c)0 t _i} . 

We claim that the following two claims hold for some > independent of n: 

Claim 1. There exists c\,C2 > (independent of v) such that for all S' C [n], |S"| < 2cin, the minimum 
singular value of A^ t r 2ci )uS n sa ^ s ^ es 0min(A§ t (2ci)u5') — ^ 2V w ^h probability converging to 
1 as n — > oo. 

Claim 2. For all t > i*, 

P{|5 t (ci)\5 t .(2ci)| >nci} =oi(t,;i/) + o 2 (t„i/;n- 1 ), 

where oi(£*,i/) vanishes as v — > at £*, ci, c 2 fixed, and 02{t*,v;n~ 1 ) vanishes as n _1 — > at 
z/, ci, C2 fixed. 

These claims immediately imply that hypothesis [2] of Lemma [5] holds with probability converging 
to one as n — > oo. Indeed, if \S'\ < nci, then (by Claim 2) St(c\) U 5' C 5^(2ci) U S" where 
\S"\ < 2nc\ with probability larger than 1 — oi(t*;z/) — 02(4*, ^; n _1 ). By Claim 1, we hence have 
0"min(^4s t ( Cl )uS') > C2 = C2IA The thesis follows since v can be chosen as small as we want. (Notice 
that once t* is fixed to satisfy these claims, we can still choose t > t* arbitrarily to satisfy hypothesis 
[T]of Lemma El as per the argument above.) 
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In order to prove Claim 1, above first notice that, for any b > 

P { S] CTmin (%.(2 C1 )US') <^} 

S'|<2 C1 n 

< P{ min <W^ (2ci)u5 ,) < c 2 is; \S u (2 Cl )\ < bn} +F{\S u {2c l )\ > bn} 

\S'\<2c 1 n 

< e" H(2ci) max P{^mi n (^ (2ci)uS ,) < c 2 u; \S t ,(2 Cl )\ < bn} + P{ \S U (2 Cl )| > bn} , 

|S'|<2 C1 n 

(5.33) 

where in the last line H(c) denotes the binary entropy of b and we used ( n ) < exp{ni?(c)}. We 
want to show that t*, b, ci, c 2 , ^ can be chosen so that both contributions vanish as n — > oo. 

Consider any b G (0,6) and restrict ci G (0, (<5 — b)/2). Then the matrix ^cOuS' ^ as n ^ rows 

and re5— 0(n) columns. Further A = A+vG with St* (2ci) (and hence St* (2ci)US") independent of G. 
We can therefore use an upper bound on the condition number of randomly perturbed deterministic 
matrices proved by Buergisser and Cucker |BC10] (see also Appendix [D]) to show that 

P{cr min (A^ (2ci)u5 ,) < c 2 v; |^(2 Cl )| < bn} < (a^)™^" 2 ^ 1 (5.34) 

with a\ = a\((b + 2c\)/5) bounded as long as (b + 2c\)/5 < 1 We can therefore select c 2 = l/{2a\) 
and select c\ small enough so that H(2c\) < (1/2) (5 — b — 2c\) log 2. This ensures that the first term 
in Eq. f|5.33j) vanishes as n — > oo. 

We are left with the task of selecting b £ (0, 5), t* > 0, so that the second term vanishes as well, 
since then we can take c\ G (0, (5 — b)/2). To this hand notice that by Proposition 1141 (and using the 
fact that X + ot~\Z has a density) we have, in probability, 

lim -\S tt (c)\=F{\X + a tt -iZ\ > (l-c)^-i}, 

n— >oo n 

and further, since at — > as t — > oo (cf. Lemma El(al)) and 9t = aat, we have 

lim T{\X + a U -\Z\ > (1 - c)0*„_i} = e + 2(1 - e)*(-(l - c)a) . 

On the other hand, by LemmaEKal), and since a G [a*, a 2 ), we have e + 2(1 — e)$>(— a) < 5. Hence 
there exist &o G (0, 5) and c\ > so that for all large enough \ St„ (3ci)| < nbo with high probability. 
Taking b G (6o, 5) and using Markov inequality (with t 1 * = t* — 1) 

¥{\S u (2cl)\ > bn} < 1 E{|5t.(2ci)\5 f< (3ci)|} + P{|5t,(3ci)| > b n} 

(0 — 0o) n 

n 

,, 2fl2 E E {(^'* + ^ V *)» " +^)0 a £ +P{|5t.(3d)| > 6 n} 

t° °0jc 1 (7 it _ 1 n i=1 

< oi(t*;i/) +o 2 (t*,i/;n- 1 ) +P{|5 t .(3ci)| > fe ™} , 

where the last inequality follows from Proposition [T3J4. LL terms can be made arbitrarily small by 
choosing v small and n large enough. 
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In order to conclude the proof, we need to show that Claim 2 holds for eventually larger i*. First 
notice that, applying again Proposition CEH4, we get 



¥{\S u ( Cl )\S u (2 Cl )\ >n Cl /2} < E{|5 t .(ci) \5 t . (2ci)|} 

2 n 

< VE{((/ +A T z t '*) i - +i T z'*)i) 2 >c 2 # 2 ,} < Oi(t,;i/) + o 2 (t,, i/; n" 1 ) . (5.35) 

1 i=i 

By Proposition [151 an d using the fact that the vector (X + Z tf ,X + Z t ) has a density, we have, in 
probability, 

lim -\S t (c 1 )\S u (c 1 )\=F{\X + Z U - 1 \ > (1 - d)<7t._i; |X + Z t _i| < (l-ci)^} </»(*,), 



where, by Lemma (a3) , /i(i*) vanishes as i* — > oo. Given any c\ > 0, we can therefore choose t* 
so that, with high probability \St(ci) \ St, (ci)| < nci/2. Combining with Eq. (|5.35p . we obtain the 
desired Claim. 

5.3 Proof of Theorem El P > />*(<*) 

Fix a small number /i > 0. By Lemma [6l (6), there exists A = A(5,e) > independent of h, such 
that, for a = ao(5,px) and t large enough 

-¥{\X + a t Z\ > aaA - 1 < h, (5.36) 
o 

|/2 t)t -2i2 t , t _i + i2 t _i |t _ 2 | < h 2 , (5.37) 

E{|i7(A: + ff t Z;aff t )|} < E{|X|}-2A, (5.38) 

as well as a 2 _ 1 < 2a 2 . By Propositions 1141 US (and noting that X + otZ has a distribution that is 
absolutely continuous with respect to Lebesgue measure), we have, with high probability, 



max I (bt- 1)«| < 2h, (5.39) 

i€[m] 

||**-** -1 ||2 < 2/tVn, (5.40) 

ll^lli < ||sco||i-nA, (5.41) 

11-2*112 < 2a*v^- (5.42) 

Namely Eq. (IQ6]) implies (f5T39"D . Eq. (IQTjl implies pT4"0D . Eq. (IQ8ll implies (|5HD . and the as- 
sumption cr t 2 „ 1 < 2er 2 implies (15.42j) . 

Using Eq. (|5.9p together with the above, we get 



\y-Ax l \\2 < - z t_1 ||2 + max|(bi)ii - 1| ||/ -1 || 2 < 2hy/n (1 + 2cr*) . (5.43) 



Define x = x* + j4 T (j4j4 T ) -1 (y — Ax 1 ) (notice that the sample covariance matrix AA T has full 
rank with high probability [ BS9 8, BS05]). Notice that, by construction Ax = y. Then, with high 
probability 



x — x 



| 2 < ^max(^)^mm(^)" 2 ||y - Ax% < C{5) (1 + 2a, )hyfc , (5.44) 
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where cr max (A), <r m i n (^4) are the maximum and minimum non-zero singular values of A. The second 
inequality holds with high probability for 5 G (0, 1) by standard estimates on the singular values of 
random matrices [BS98, BS05J. Using Eq. (|5.4ip together with triangular inequality and \\x — x ||i < 
y/n \\x — x t \\2 we finally get 

\\x\\i < \\xq\\i -nA + C(5)(l + 2a*)hn < \\x \\i (5.45) 

where the second inequality follows from the fact that h > can be taken arbitrarily small (by 
letting t large) while A, C and o - * are fixed. We conclude that xq cannot be the solution of the i\ 
minimization problem (|5.ip . 
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A Proof of Proposition [14] and [15 



In this appendix we prove Proposition Q3] and [15] by a suitable application of Theorem [6[ Before 
passing to these proofs, we establish a corollary of Theorem [6] that allows to control iterations of the 
form (|5,6p . (|5.7p . with rj(- ; • ) replaced by a general polynomial. 



A.l A general corollary 

For xo = Xo(n) £ R n and A = A{n) G ]R mxn as p er Hypothesis 1 in Section [51 we define y = y(n) G 
R m by 

y = Ax . (A.l) 

Let D G M nxn be the diagonal matrix with diagonal entries equal to the square column norms of A, 
that is Da = ^je[ m ] -^jv an d ^ij = for i / j. Further define uq = uo(n) G M. n as follows 

uo,i = (At - l)x ,i = ( Yl A % ~ X ) x o,i • (A.2) 

j€[m] 

Let x° = (J — D~ 1 )xq (notice that D is invertible with high probability) and define iteratively 

z t = y-Ax l + b t J" 1 , (bt)ii = A irt-i (D jjX )r l + (A T z t ~ 1 )j - u 0j ) , (A.3) 

x t+1 = 7 ]t (Dx t + A T z t - uq), (A.4) 

where, for each t, rjt : R — > R is a polynomial and, for v G M n , i]t(v) = (i]t(vi), . . . ,r] t (v n )). Further 
bt G W nxm is a diagonal matrix with entries given as in Eq. (IA.3p . 
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We next introduce the corresponding state evolution recursion. Namely, we define {R s ,t}s,t>o 
recursively for all s, t > by letting 

R s +i,t+i = ^E{[ Vs (X + Z s ) - X][ Vt {X + Z t ) - X)} . (A.5) 

Here expectation is with respect to X ~ px and the independent Gaussian vector [Z s , Zt] with zero 
mean and covariance given by E{Z 2 } = i? SjS , E{Z 2 } = R\t and E{ZtZ s } = Rt, s - The boundary 
condition is fixed by letting i?o,o = E{X 2 }/6 and defining, for each t > 0, 

R 0>t +i = ^E{[ Vt (X + Z t ) - X][-X]} , (A.6) 

with Zt ~ N(0,i?t,t)- This uniquely determines the doubly infinite array {Rt,s}t,s>o- 

Corollary 16. Let {(xo(n), A(n), y(n))} n >o be a sequence of triples with A(n) having independent 
subgaussian entries with K{Aij} = 0, E{Afj} = 1/m, {xQ : i(n) : i G [n]} independent and identically 
distributed with xo,i(n) ~ px, and px a finite mixture of Gaussians. Define {x 1 ^ 1 }^ as per 
Eqs. Q. ' 

Then, for any fixed t,s > 0, and any Lipschitz continuous functions ?/) :MxlxR->R, 
cj) : R X R — > R, in probability 

1 n 

lim -V^L.^ + liVj^l + ^V),) =E^(I,I + 2 S! I + ^) , (A.7) 

n— >oo n z — * V / 

8=1 

1 " 

lim *?) = E<j>{Z 8 ,Z t ) , (A.8) 



n->oo m 

i=l 



where expectation is with respect to X ~ independent Gaussian vector [Z s ,Zt] with zero 
mean and covariance given by E{Z|} = R s , s , E{Z 2 } = Rt.t and E{ZtZ s } = Rt, s - 

Proof Define x t+l = A 1 z l + Dx l - Dx . Then Eqs. (|A~3|) . (|AT4|) read 

= A/^a^tJ + bt/^*- 1 ;*-!), (A.9) 

A T h(x t ;t) +6 t f(x t ,x ;t), (A.10) 



where, for i £ [to], j G [n], 

f{x,y,t)=y-f]t-i(y + x), h(z-t)=z, (A.ll) 
(b t )« = - ^ 4/'(^,x , i; t), (A.12) 

(d4 = -^4%t). (A.13) 

J'6[n] 

(Here f'(x,y;t), h'(x;t) denote derivatives with respect to the first argument.) The iteration takes 
the same form as in Eqs. (|4.3j) . (|4.4p with Y(i) = x 0ti , and W(i) = 0, B t = -b f and D t = —d t . 
Further, the initial condition x° implies x° = —xq. Notice that this is dependent on Y = xq, but we 
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can easily set the initial condition at x 1 = and define f(x, y; t = 0) = —y. We can therefore apply 
Theorem [6] and conclude that, in probability 



n— yoo n 

i=l 



lim - y2^(x 0tl ,D ll (x!-xo, i ) + (A T z s ) l ,D ii (x t i -x , i ) + (A T z t ) i ) = E^(X,Z s ,Z t ) , (A.14) 

1 n 

lim - =E0(Z a ,Z t ) , (A.15) 



n->oo 777, 



where expectations are defined as in the statement of the Corollary. The second of these equations 
coincides with Eq. (|A.8|) . For the first one, note that ¥,{Da} = 1 and, by a standard Chernoff bound 

lim max {Ai: i £ [n]\ = 1, (A.16) 
limminjAi: » £ [«]} = 1. (A.17) 

We therefore get 
1 n 

lim - Yii(x .i,(x s + A T z s ) i -x 0ti ,(x t i +A T z t ) i -x . i ) = M^(X,Z s ,Z t ) , (A.18) 
1=1 

which coincides with Eq. (|A.7p after a redefinition of the function ■0. □ 
A. 2 Proofs of Propositions [14] and 1151 

We will start by proving Proposition 1 14[ Since Proposition [15] follows from the same construction, we 
will only point to the necessary modifications. Before presenting the proof, we recall a basic result 
in weighted polynomial approximation (here stated for a specific case), see e.g. [Lub07| . 

Theorem 9. Let f : R — > R be a continuous function. Then for any n, £ > f/iere exists a polynomial 
p : R — > R suc/i i/mi, /or x G R, 

|/(x) -p(x)| < ie Kx2 l 2 . (A.19) 

Proof of Propositions \IJ\ Since the proposition holds as n — > oo at t fixed, we shall assume through- 
out that t £ {0,1, ... , i max } for some fixed arbitrarily large t ma x- 

We claim that, for each j3, i max > 0, we can construct an orbit {x 13,1 , 2^'*}t>o obeying Eqs. (|A.3|) . 
(|A.4j) for suitable functions rj t = rjf such that the following holds (with a slight abuse of notation 
we will drop the parameter /3 from x^' 1 , z^' 1 ). For all < t < i max , and all functions tp as i n the 
statement, we have z t = y — Ax 1 + t>t z <_1 by construction. Further, in probability, 



lim max 

n->ooier m i 



(b t )ii - -F{\X + a t -iZ\ >aa t -x] 



<(3, (A.20) 



lim - \\x t+1 -riix 1 + A r z t ;aa t )\\l < (3, (A.21) 

n— >oo n 



lim 

n— >oo 



1 n 

- V ^(x ,i, x\ + (.4V);) - E 0(X, X + a t Z) 
a — * 



n 
i=l 



<0. (A.22) 
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Assuming this claim holds, let {(3e}i>o be a sequence such that lim^oo fii = 0. Denote by {x z '*}t>o 
the orbit satisfying Eqs. (|AT20|) . (|AT2~Tj) . fA22|) with = fy. Let r?| = r/p f) be the correspond- 
ing polynomial, and bf be given per Eq. (|A.3|) . Fix an increasing sequence of instance sizes 
n\ < ri2 < n 3 < . . . , and let x*(n) = x e,t (n), 2:*(n) = z £, *(n) for all ri£ < n < n^ + i. Choosing {ri(}^>Q 
that increases rapidly enough we can ensure that, for all n > rig, 



max 



(b 



t )u 



- 5 ¥{\X + a t ^Z\ > aa t -i} 



n 



r 1 (x e > t + A*/> t ;aa t )\\ 2 2 <2f3 l , 



1 n 

- V rp(x ,i, xf + (A T z e ' t ) i ) - E iP(X, X + a t Z) 



i=l 



(A.23) 
(A.24) 
(A.25) 



with probability larger than 1 — f3i. Points 1, 2, 3 in the proposition then follow since — > 0. 

In order to prove Eqs. (|A.20|) to (|A.22|) we proceed as follows. It is easy to check that at > for 
all t, cf. Eq. ()5.8p . We use Theorem [9] to construct polynomials r\t such that 



\rj(x; aa t ) - rj t (x)\ < £ exp 



16max(<7^, s 2 ) 



(A.26) 



for all x G R. Here ^ > is a small parameter to be chosen below, and s 2 is the smallest variance of 
the Gaussians that are combined in px- Let at be defined by 



1 



ai +1 = -nMX + a t Z)-XY} 



(A.27) 



with Z ~ N(0, 1) independent from X ~ px, and (Tq = E{X 2 }/<5. Notice that a 2 = R tt - From 
Eqs. (|5.8p . ()A.26p . and (|A.27|h it is then straightforward to show that \a 2 - a 2 \ < C£ for some 

C = C{t). 

Given polynomials as defined by (|A.26|) . we define {x l , z^txi as per Eqs. (|A73]) . f|AT4|) . with the 
initial condition given there. Equation (|A.22j) follows immediately from Corollary 1 161 for £ sufficiently 
small. Equation (|A.21|) also follows from the same Corollary, by taking 



^(x 1 ,x 2 ,x 3 ) = {i] t (x 3 ) - T](x 3 ; aa t )} 2 , 

and then using once again Eq. (jA.26|) on the resulting expression. 
Finally, consider Eq. (lA.20j) . For economy of notation, we write 



(A.28) 



J6[n] 



Vt-iiDjjXj + (A 



J t-li 



u 0,j) 



(A.29) 



and further define 



bf 



(A.30) 
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Then we have 

E{((b t )„ - br) 4 } = E E { (4i - ^) (4, - ~) (4a - ~) (4, - w^ 3 } 

Jl ,32,33 J4e[n] 
31 ,32 J3J4G[n] 



Using the tree representation in Section 13. 2[ it 
right-hand side is bounded as follows 





s) 


< 


K 

rfi 


E(q,q,r, 


s) 


< 


K 

)V' 


E(r, r, s, 


s) 


< 


K 


E(r, r, r, 


') 


< 


K 

n* 


E(r, r, r, 


r) 


< 


K 

n 3 



is not hard to prove that the expectation on the 

p, q, r, s distinct, 
q, r, s distinct, 
r, s distinct, 
r, s distinct, 



Consider for instance the first case, p, q, r, s distinct. Using Lemma [31 each of ip p , cp q , cp r (p s can be 
represented as a sum over trees with root type respectively at p, q, r, s. The weight of these trees is 
as in Lemma El times the prefactor (Af p — m~ 1 ) ■ ■ ■ (Af s — m~ l ). Let fi be the total number of edges in 
these trees, plus 8 (two for each of the additional factors). Then any non- vanishing contribution is of 
order n~^/ 2 . Let G be the graph obtained by identifying the vertices of the same type in these trees, 
and e(G) the number of its edges. Since each edge in G must be covered at least twice by the trees 
to get a non-zero expectation, and the edges in (i,p),. ■ ■ s) at least once, we have 2e(G) + 4 < fx. 
The number of vertices in G is at most e(G) + 1 (note that G is connected because it includes type 
i connected to p, q, r, s). Of these vertices all but 5 (whose type is i, p, q, r, s) can take an arbitrary 
type, yielding a combinatorial factor of order n e ( G )+!- 5 < ^m/2-6_ fj ence th e sum over trees is of 
order n~^l 2 n^l 2 ~^ = n~ G as claimed. 

Summing over ji, . . . , j'4 de above bounds we obtain E{((t>t)u — bf v ) } < K/n 2 and therefore, 
by Markov inequality 

lim pf max|(b t )n - bf I > n~ 1/5 l = 0. (A.31) 

n-s-oo I ie[m] > 

Since by standard concentration bounds max ig r n i Da, min ie r n i Da — > 1, we obtain, in probability, 

lim max(bj)jj = lim min(bt)jj = lim bf 

rwoo m *■ — ' J 
3'e[n] 

= ^&{r,' t _ l (X + o t _ 1 Z)} 
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where, in the last step, we applied Corollary [16] to the polynomials rj t _ x , and X ~ px, Z ~ N(0, 1) 
are independent. We are left with the task of showing that, by taking £ small enough in Eq. (IA.26|) . 
we can ensure that 



\&{ n ' t _ x {X + a t - 1 Z)}-W{\X + a t - 1 Z\>aa t - 1 }\ </3S. 



(A.32) 



Indeed integrating by parts with respect to Z the above difference can be written as (for K a finite 
constant that can depend on t and change from line to line) 

E{Zr) t -i(X + a t -i Z)} - —E{Zr]{X + a t -i Z; aa t -i)} 



<?t-\ 
< KE 



< K£Eiex.p 



Zrj t ^ i (X + a t - i Z) - Zrj(X + o t - 1 Z\ aa t ~ i ] 
X 2 + a 2 _ x X 2 



+ K\a t -i - a t -\\ 



4 max( 
< KZ + K^t-x-at-x]- 



als 2 )\] 



+ K\a t -i - a t -i\ 



The claim follows by noting that, as argued above \(7t—i ~ o^t— 1| < 

Consider finally point 4. First recall that we constructed the vectors {x , z t }t>o, using a sequence 
of orbits z^'*}t> , indexed by £ G N, that obey Eqs. (|A.3j) . (|A.4|) . and letting 



x'(n) = x '*(n) , z*(n) = z e,t (n) , for all n, with ii£ < n < n# + i. 
Claim 17. There exists a sequence {fti}^^ ura'f/i lim^oo = such that, for all £' > I, 
lim-VE{((/- i + AV' 1 ) 1 -(/ i + iV> t ) 1 ) 2 } < fa, 

n— too n ' — ' 

liml^E{( 2 p-zf) 2 } < ft. 

n->oo 777, L v 



(A.33) 

(A.34) 
(A.35) 



iEra 



The proof of this claim is presented below. It follows from this claim that, by eventually redefining 
n,£> to be larger we can ensure 

E{((x^ t + A T z e '' t ) I -(x^ + A T /' t ) I ) 2 } < 2fy, 



for all n > riff. Here and below expectation is taken also with respect to / uniformly random in [n] 
and J uniformly random in [m]. By Eq. (|A.33p . for all n > ri£, we also have 

E{((x t + A T z t ) I -(x e ' t + A T z^) I ) 2 } < 2ft, 

< 2ft. 

Applying LemmaHl we can then construct {x l , z l }t>o as in the statement at point 4, such that 
E{({x t + A T z t )i-(x e ' t + A T z e ' t ) I ) 2 } < K{u 2 + n~ 1 / 2 ), 



-l/2> 



where X depends on £ but not on v or n. Proof is finished by using triangular inequality and selecting 
£ = £{y, t) diverging slowly enough as v — > 0. □ 
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We now prove Claim [T71 

Proof of Claim \H\ To be definite we will focus on Eq. (|A.34jh 

Fis £, £' G N (not necessarily distinct). By an immediate generalization of Corollary 1161 we have, 
in probability 

lim - V EUx^ + A T z e ' t - xoUaf'* + A T z e '' t - x )i} =Q\o>. (A.36) 

ie[n] 

Further, the quantities Q\ t , satisfy the state evolution recursion 

Qg, 1 = [ v f(X + Z t/ ) - X] [ v i'(X + - X] } , (A.37) 

with initial condition Q?« = (l/c))E{X 2 }. Here expectation is taken with respect to X ~ px 
and the independent centered Gaussian vector (Z t ^ 7 Z t ^i) with covariance given by E{Z| t } = Q^, 
E{Zf, J = Q*,^, E{Z e>t Z ett } = Q\ e . In order to prove the claim, it is therefore sufficient to show 
that 

lim sup \Q\ e - erf I = 0, (A.38) 

since this implies lim^oo sup^. ^>^[Q^ — 2Q^, + Q e i e i] = 0, which in turn implies the Claim, via 
Eq. (TA~M|) . 

Finally, recall that rr| was constructed using Theorem El cf. Eq. COBl . in such a way that, for 
all x £R, 

\r](x;aa t ) < & expj — X — 1 , (A. 39) 

1 1 [ 16max(0£, s^) J 

with ^ as £ 00. The desired estimate (|A.38P then follows by recalling that erf, 1 = 
(1/5)E{ [rj(X + a t Z) - X] 2 } and using Eq. (TA~39"jl inductively to show that \Q\^ - erf | < K(t) □ 

We finally sketch the proof of Proposition [T5l 

Proof of Proposition [75l The sequence {x*,2;*}t>o is constructed as in the previous statement. The 
proof hence follow by using Corollary [16j an d taking £ small enough in Eq. (|A.26p . since we can 
ensure that \Rt,s — Rt,s\ < /9' for any /3' > and any t,s < i max (as shown above for the case 
i = s). □ 

B Proof of Lemma [5] 

Throughout the proof we denote by C±, C2, C3 etc, positive constants that depend uniquely on 

Cl,...,C 3 . 

Consider the t\ minimization problem 

minimize \\ x \\i > 
subject to y = Axq . 
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and denote by x any minimizer. Further, let v be a subgradient as in the statement, and define, for 
some c G (0, 1), 

5(c) = {i G [n] : H > 1 - c} . (B.l) 

Also, let 5(c) = [n] \ 5(c) be the complement of this set. Notice that, by definition of subgradient, 
we have Vi = sign(a;o,i) for alH G 5 and \vq^\ < 1 for all in 5 = [n] \ 5. This implies that 5 C 5(c). 
We have 

Pill = Ikolli + (v, (x - x )) + R1 + R2, (B.2) 
#1 = p5(c)l|i - l|^o,5(c)l|i - (vs(c),(^ ~ x o)s(c)) , (B.3) 
#2 = ll% (c) l|i - lko,5(c)lll - (^(c)'^-^)^))- (B.4) 

Since 5(c) C 5, we have ^ 5(c) = ^ ano - hence 

#2 = ll%( c )l|l - («3( c )>%(c)> = X] (1^1-^^)^ X] (l^l _ ( 1_c )l^l) =c ll%(c)lli- ( B - 5 ) 

i65(c) 165(c) 

On the other hand, vg( c ) 1S m the subgradient of ||x,s( c )||i at xs( c ) = x o,S(c)- Hence R\ > 0. It follows 
that Eq. (|B.2p implies ||x||i > ||xo||i + ( v > ( x ~ x o)) + c ll%?( c )l|i- Since x is a minimizer, we thus get 



l%f c Ji - --( v > ( x ~ x o)) = (x-x Q )) < -Vn\\x - x \\ 2 , (B.6) 

c c c 



where in the last step we used Cauchy-Schwarz together with assumption [TJ Hereafter we let r = 
x — Xq. 

Let 5(c) = \jf =l Si be a partition such that nc/2 < |5^| < nc, and that |rj| < \rA for each i G 5^, 
j G 5^_i- If 1 5(c) I < nc/2, such a partition does not exist, but the argument follows by an obvious 
modification of the one below. Further define 5+ = U^ 2 ^ — S( c ) an d 5+ = [n] \ 5+. We have 



3 = EKII3 £ f>l (^'f <i EKII? £ ^11^,115- P") 

£=2 £=2 «=1 



Fix c = c\. Since 5(c) C 5, we have = ^%( c ) and using Eq. (|B.6p we conclude that there exists 
C\ < 4/cf such that 

||^ + ||i<C ie 2 ||r||i. (R8) 

On the other hand, by definition Ar = 0, and hence As + r$ + + ^s + r s+ = ^- Since 5(c) C 5, we have 
5 C 5(c) C 5+. Further 5 + \ 5(c) = 5i, whence |5 + \ 5(c) | < nc = nc\. By assumption [21 we have 
fmm(^4s + ) > C2 and therefore 

Iks+lb < — \\As + r s+ h = — 11^.^. Ha < —Iks+lb- 

C2 C2 + + C2 + 

Combining this with Eq. (|B.8j) . we deduce that ||r||2 < C2E IMI2 for some C2 = 02(01,02,03), which 
in turns implies r = provided that C2E < 1. The claim hence follows for Eq = l/[2C 2 (ci, c 2 , C3)]. 
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C Asymptotic analysis of state evolution: Proof of Lemma [6] 

Before proceeding, we introduce the following piece of notation (following [BM12] ). Fix a probability 
distribution px on R, with Px({0}) = 1 — e, and 5 > 0. For 0, a 2 > 0, we define 

F(a 2 , 0) = i E{ [r?(Z + aZ; 0) - X] 2 } , (C.l) 

where expectation is taken with respect to the independent random variables X ~ px and Z ~ 
N(0, 1). When necessary, we will indicate the dependency on px by F(<r 2 , 0;px)- With this notation 
the state evolution recursion reads a 2 +1 = F(<t 2 , ao"t)- The following properties of the function F 
were proved in [DMM09] (but see also |BM12j . Appendix A for a more explicit treatment). 

Lemma 7 ( [DMM09 . ) . For any a > 0, the mapping a 2 \— > F(cr 2 ,ao~) is monotone increasing and 
concave with F(0, 0) = and 

d 



F(er 2 , aa) 



(7=0 



hs(l + a 2 )+2(l-s)E[(Z-a)l]}. (C.2) 



d(a 2 ) 

It is also convenient to define 

G £ {a) = e(l + a 2 ) + 2(1 - e)K{(Z - a) 2 + } (C.3) 
= e(l + a 2 ) + 2(1 - e) [(1 + a 2 )$(-a) - a^(a)] . 

The first two derivatives of a i— )■ G e (a) will be used in the proof 

G£(a) =2ae + 4(1 - e) [ - 0(a) + a$(-a)] , (C.4) 
G^'(a) =2e + 4(l-e)$(-a). (C.5) 

In particular, we have the following. 

Lemma 8. For any e G (0, 1), a4 G £ (a) is strictly convex in a £ R+, mi/t a unique minimum on 
a*(e) G (0,oo). Further G £ (0) = 1 and lim a _ i , 00 G e (a) = oo. Finally, the minimum value satisfies 

G £ (a*) = e + 2(l- £)$(-«*) = ^G^(a*) G (0, 1) . (C.6) 

Proof. By inspection of Eq. (IC.5jl . G"(a) > for all a > 0, hence G £ (a) is strictly convex. Further, 
from Eq. (fCUj) . we have G^(0) = -4(1 - e)0(O) < and Gg(a) = 2ae + O a (l) > as a ->■ oo. Hence 
a I—?- G e (a) has a unique minimum a*(e) G (0, oo). 

Finally, Eq. (|C6[) follows immediately by using the condition G' e (a*) = in the expression 
(p3jl . □ 

In our proof it is more convenient to use the coordinates (5,e) instead of (p,5). In terms of the 
latter, the phase boundary (|1.2p . (jl.3p reads 

5 fe) = 2 ^*( g )) fc 71 

a*(e) solves ae + 2(1 - e) [a$(-a) - 0(a)] = . (C.8) 

Notice that the use of the symbol a*(e) in the last equations is not an abuse of notation. Indeed 
comparing Eq. (|C.8[) with (|C4[) we conclude that a*(e) is indeed the unique solution of G' e (a) = 0. 
Further, comparing Eq. (IC.7P with Eq. (jC3[) we obtain the following. 
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Lemma 9. Let (5,p Jf (5)) be the phase boundary defined by Eqs. U.3\) . Then, for p,5 G [0, 1], 

p > p*(S) if and only if, for e G (0, 1), 8 G (e, 1) 

S <dJe) = mmGJa). (C.9) 
Viceversa p < p*(5) if and only if 5 > <5*(e). 
C.l Proof of Lemma [61(a): p < p*(8) 

Proof of Lemma® (al) . We set a = a*(e) = argminax) G £ (a). Hence we have, by Lemma [TJ and 
Lemma [9l 



Via 2 , o*o\ 



= I m inG £ (o) = M^. (CIO) 



In particular, by Lemma [U for p < p*{5), we have ^hz) F(<7 2 , a^cr) = aj*(e, 5) G (0,1). Since, by 
Lemma[71 a 2 h-> F(cr 2 ,a*cj) is concave, it follows that a 2 = + Of(l)]. 

Let S = {a G R+ : G £ (a)/5 < 1}. Since a i— > G e (a) is strictly convex by Lemma El with 
G e (0), G e (oo) > 5, we have S = (01,02) with < a\ < a* < 02 < 00. Let ui(a) = G £ (a)/5. Fixing 
o G (01,02), by concavity of a 2 \-> F(a 2 ,aa), we have a 2 = Bu:(a) t [l + ot(l)]- Finally, by continuity 
of o 1 y G £ {a), we have {oj(a) : a G (01,02)} = [a;*, 1) and hence any rate u) G [w*,l) can be 
realized. 

Finally by Lemma[8]G e (o*) = e + 2(l — e)&(— o*) < 5. Since o h-> £ + 2(1 — e)$(— o) is decreasing 
in o, the last claim follows. □ 

In the proof of part (a2) we will make use of the following analytical result. 

Lemma 10. For e G (0, 1), o > o*(e), consider the function J- a<£ : [0, 1] — > R defined by 

FaAQ) = 7 ^E\[ V {X O0 + Z 1 ;a)-X O0 ][r ] {X O0 + Z2;a)-X O0 ]}, (C.ll) 
G e (o) I J 

where expectation is taken with respect to X^, ¥'{X 00 = 0} = 1 — e, ^{X^ G {+00,— 00}} = e, 
and the independent Gaussian vector {Z\,Z<i) with mean zero and covariance E{Z 2 } = E{Z|} = 1, 
E{ZiZ2} = Q. (The mapping x 1— > [r/(x + a;b) — x] is here extended to x = +00, —00 by continuity 
for any a, b bounded.) 

Then J- a , £ is increasing and convex on [0,1] with J-" ae (l) = 1 and J-' a£ (l) < 1. In particular 
FaAQ) > Q for alle [0,1) 

Proof. It is convenient to change variables and let Q = e~ s . If we let {{/ s } se R denote the standard 
Ornstein-Uhlenbeck process, dU s = — U s ds + y/2dB s with {i? s } s€ ]R the standard Brownian motion. 
Then T a>£ (Q) = T aj£ (- log(Q)), with 

= TTT^HMXoc + U ;a) - X^MX^ + U s ;a) -*«,]}. (C.12) 

A simple calculation yields 

^rF*,e(s) = - 7 ^ r -E{7 ] , (X oo + U ;a)r ] '(X oo + U s ;a)}e- s , (C.13) 
as G £ {a) 
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where r/( • ; a) denotes the derivative of 77 with respect to its first argument. By the spectral decom- 
position of the Ornstein-Uhlenbeck process, we have, for any function tp E L 2 { 



(C.14) 



fc=i 



for some non-negative {\k}k>i- In particular e s ^T a ^ £ (s) is strictly negative and increasing in s. 



We therefore obtain 



4^a,e(Q) = 7 ^K{r ] '(X O0 + Z 1 ;a)7 ] \X O0 + Z 2 ;a)}, 
aQ G £ (a) 



(C.15) 



Which is strictly positive and increasing in Q. Hence Q *— > J p aE {Q) is increasing and strictly convex. 
Finally, since r]'(y;a) = l(|y| > a), we have 



G £ (a) 



FUX^ + Zl > a} 



G e (a 



{£ + 2(1 -£)$(-«)} 



2G £ (a) 



(C.16) 



Since by Lemma[8]a 1— > G e (a) is strictly increasing over (a*(£), 00) and by Eq. (1C.5|) a i-> G"(a) is 
strictly decreasing over M + , we have 



dQ 



< 



2G £ (a*(e)) 



1, 



where the last equality follows again by Lemma [HJ This conclude the proof. 
We are now in position to prove part (a2) of Lemma [6j 



(C.17) 
□ 



Proof of Lemma\^{a2) . Throughout the proof we fix a 6 (a*(e, 5), 02(6, 5)). Let the sequence 
{ a t}t>o be given as per the state evolution equation (|5.8j) . Define Q t = Rt,t-i/ '(ct&t-i)- By Propo- 
sition HH is the covariance of two gaussian random variables of variance 1. Hence \Qt\ < 1- Using 
Eq. (|5.15p we further have 



Qt+i — FtyQt 



oo- t+ i I 



X 



77 VZi]a) 



o-t 



X 



o-t 



X 
o~t—i 



+ Z 2 \ a 



X 
o-t-i 



(C.18) 
(C.19) 



were expectation is taken with respect to X ~ px and the independent Gaussian random vector 
(Z\,Z 2 ) with zero mean and covariance E{Z^} = 1, E{Z|} = 1, K{Z±Z 2 } = Qt- By induction it is 
easy to check that Qt > for all t. 

For q G (ai,a 2 ), by part (al) we have at — > 0. Hence X/at converges in distribution (over the 
completed real line) to a random variable Xoo ~ (1 — e)5o + £ + <5 +00 + £-<5-oo where e+ = F{X > 0}, 
e_ = P{X < 0}, £ = £+ + £_. Hence the expectation in Eq. ()C.19|) converges pointwise to 



E{ [7/(^00 + Z r , a) - Xoo] [riiXco + Z 2 ; a) - I„] } 



(C.20) 



(Notice that this expectation depends on the distribution of I ro only through e, because of the 
symmetry properties of the function 77.) 
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Further, by the proof of part (al), as t — > oo we have of — > and 



2 



d(a 2 



a=0 



Hence 



lim 



6 



t^oo a t+ i G e {a) 
Comparing Eqs. dCTTTi) and (fCTT9l) we conclude that, for any Q G [0, 1] 

lim F t {Q) = F a , £ (Q) . 



t— ¥00 



(C.21) 



(C.22) 



(C.23) 



Further the convergence is uniform, since the functions J~t are uniformly Lipschitz (see proof of 
Lemma [10] above) . 

Consider now the sequence {Qt}t>o an d let Q* = liminf^oo Q t . Since Q t € [0,1] for all t, we 
have Q* £ [0, 1] as well. We claim that in fact Q* = 1 and therefore limt_ s . 0O Qt = 1, which implies 
the thesis. 

In order to prove the claim, let {Qt(fc)}fceN be a subsequence that converges to Q*. Then 
Q* = hm Ji(fc)_i(Qt(fc)_i) = lim ^(Qt^wi) > J" a , £ (lim inf Q i(jt )_i) > J" a , e (Q*) , (C.24) 

fe— >oo w w fc— >oo v 7 fc->oo v ' 

where, in the last step, we used the fact that J- a ,e{ • ) is monotone increasing. Since J- a)£ {q) > q for 
all <? € [0, 1) by Lemma HUJ we conclude that Q* = I. □ 

Before proving (a3) of Lemma EJ we establish one more technical result. 

Lemma 11. Let px be a probability measure on the real line such that Px({0}) = 1 — e and 
K px {X 2 } < oo, Assume px to be such that max(px((0, a)),px((— a, 0))) < Ba b for some B,b > 0. 
Then, letting ~ (1 — e)5q + e+o+oo + eS-oo (with the notation introduced above, namely, 
e_i_ = px(0, +oo) and e- = px(— oo,0)): 



E 



r fX 



7][ YZ\\a) 



7? 



X 
o-t-i 



+ Z 2 ; a 



X 



0t-i 



} 



(C.25) 



E{ [^(Xoo + Z i; a) - Xoo] [^(Xoo + Z 2 ; a) - X^) } < B\a b t + o^i) 



/or an eventually different constant B' . Here expectation is taken with respect to X ~ px and the 
independent Gaussian random vector (Z\, Z2) with zero mean and covariance E{Z|} = 1, E{Z|} = 1, 
E{ZiZ 2 } = Qt, and 



Ft- 2 



dF 



[a ; aa) 



a 2 + 0(a 2+ b) 



(C.26) 



(7=0 



Proof. By triangular inequality, the left hand side of Eq. (|C.25|) can be upper bounded as D\ + D 2 
whereby 



£>i = E 



A" 
0-1 



+ Zi;a 



X 



r ] (X 00 + Zr,a) +X 



+ Z 2 ;a 

o~t-i 



X 

o-t-i 



D 2 = E[[rj(X oo + Z 1 ;a)-X 



+ Z 2 \a 



X 



o~t- 



}• 

- - 7?(Xoo + Z 2 ; a) + X^ I . 
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Here X and X^, are coupled in such a way that X = if and only if X^ = and the two variables 
have the same sign in the other case. We focus on bounding D\ since D 2 can be treated along the 
same lines. Letting R{x; 9) = n(x; 9) — x, we have 



D x =E 



r(— +Z 1 ;a) 



R 



X 



+ Z 2 ;a + Z 2 



} = D 1>a + Z> 1(6 . 



L>i, a = fi{ [r{^- + Zi;aj- R(X 00 + Zi; a) 



i? 



0t-i 



+ Z 2 ; a 



1,6 



R' 



,( x 



+ Zi;a) -^(Xoo + Zisa)]}, 



where in the last line we used Stein's lemma to integrate over Z 2 , and i?' denotes derivative with 
respect to the first argument. Once again the two terms are treated along the same lines, and we 
will only consider D\^ a . We have 



D ha \ < aE 



R 



(- 



;<*)|} 



<ae+E| i?(^± + Zi;a) - R(+oo; a)| } + ck-_e{ R(^—+Zy,a) -R(-oo;a) }, (C.27) 



X. 



where X + (resp. X_) is distributed as X conditioned on X > (resp. X < 0). The function 
x i—)- R(x; a) — R(oo; a) is monotone decreasing, equal to 2a for x < —a and to for x > a. Hence 
R(x) = Kz 1 {\R{x + Z\\a) — i?(+oo;a)|} is monotone decreasing, takes values in (0,2a) and upper 
bounded by Ce~ x / 4 for x > 0. Denoting by F + the distribution of X + , we have 



E 



fX \ 1 ~ f°° ~ 

R[-^- + Zi; a) - R(+oo; a) J = Ei2(X + /<T t ) = y F(za t ) dx < £V t 6 . 



The other term in Eq. (|C.27|) is bounded by the same argument. This concludes the proof of 
Eq. (1(125]) . 

The proof of Eq. ()C.26P follows from Eq. (]C.25P if we notice that 



F(a 2 ,aa) = ^- e{ [??(- + Z; a) - x] }, 



dF 1 2 \ 
[a ; aa) 



d(cr 



(7=0 



□ 



The last lemma has a useful consequence that we will exploit in the ensuing proof of Lemma 
E(o3). 

Corollary 18. Let J- aj£ {Q) be defined as per Eq. liC.ll\) and J~t{Q) defined as per Eq. $C.19\) with 
px, ot, e satisfying the conditions of Lemma 0.(a3). Then there exists a constants B,B',b > 
depending on px such that 



sup 

Qe[o,i] 



T t (Q) - F a ,e{Q) 



<Ba b t < B'u bt / 2 
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Proof. The second inequality follows from the first one using Lemma [6Kal). Using Eq. (|C.26|) . we 

have 

°t-i G 1 a t-i S 2 f b b x 

af +1 F(al,aa t ) ' F(a t 2 _ i; «r w ) G«(a)^ A + " 

The proof of the corollary is obtained by noting that <7t = @(o~t-i) and applying Eq. (|C,25p to the 
expectation in Eq. ()C.19p . □ 

Proof of Lemma d(a3). Define, as in the proof of part (a2), Qt = Rt,t-i / \o~to~t-i) , and recall that 

Q t+1 = T t (Qt) ■ 

By Corollary 1181 and Lemma [TOl it follows that Qt > 1 — ^4<U 2 * for some constants A > 0, U G (0, 1). 
Indeed 

Qt+i > FaAQt) ~ B'J t/2 > 1 - B'J'I 2 - J-; £ (l)(l - Qt) • 

and the claim follows by noting that J-' a £ (1) G (0, 1) by Lemma [TUl 

Next, consider a sequence of centered Gaussian random variables (Zt)t>o with covariance E{ZjZ s } = 
-Rt s . By triangular inequality, we have, for any t < s, 



&=t+i fc=t+i 

(C.28) 

Next consider the quantity in Eq. (|5.19p . We have 
sup P{|X + Z,| > ca s ; \X + Z t \ < ca t ) 

t,S>to 

< supF{|X + Z t \ < ca t ; X ^ 0} + sup P{|Z s /cr s | > c; |Z t /<7 t | <c; X = 0} 

t>to t,s>ta 

= supP{\X/a t + Z t \ < c ; X ^ 0} + sup > c; |Z t | < c } , (C.29) 

i>io t,s>to 

where (Z s ,Zt) are Gaussian with E{Z 2 } = E{Z 2 } = 1, and ¥,{Z s Zt} = Rt,s/ '{o~to~ s )- The first term 
in Eq. (|C.29P vanishes as to - > °o since at — > as t — > oo, and the second vanishes by Eq. (|C.28|) . □ 

C.2 Proof of Lemma EK»: p > p,(<5) 

Proof of Lemma\^{bl), (62). First notice that, with the definitions given in the previous section 

2 

= -{(1 + a 2 )$(-a) - acj){a)} . 

Notice that the right hand side is equal to 2/5 for a = 0, monotonically decreasing in a, and vanishing 
as a — > oo. Hence there exists a m { n (e, 5) such that the right hand side is smaller than 1 if and only if 
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a > a m i n (e, 5). Further, a 2 i ^ F(cr 2 ,acr) is concave with F(0, 0) = and first derivative larger than 
1 at a 2 = (cf. Lemma [7]). It follows that for a > a m - m (e,5) there exists a unique a*(5,px) such 
that F(a 2 ,aa) > a 2 for all a G (0, cr*) and F(<7 2 ,ac) < a 2 for a G (cr*, oo). It follows that a 2 — >■ cr* 
for any ctq ^ 0. This proves the first part of claim (61). 

Letting a 2 = a 2 (a), it is easy to check that a h-> cr 2 (a) is continuous for a G (a mm ,oo) with 
lim Q ,_ >Q , min c 2 (a) = +oo (the limit being taken from the left), and lrnic^oo a 2 (a) = +E{X 2 } / 5 > 0. 
As a consequence 



lim 



\X + a*Z\ > aa*} = 2$(-a mi „), (C.30) 
lim P{|A + cr*Z| > aaA = 0. (C.31) 

a— >oo 

Notice that by the definition of a m i n given above, we have 

2$(-a min ) - 2a min {^(a min ) - a min <I>(-a mm )} = <5 ■ 

Since <p{z) > z&(— z) for z > 0, it follows that lim a _j. amin P{|A + cr*Z| > acr*} > 5. We define 

<*o(S,Px) =sup{a > a min (e,S) : P{|A + a*Z\ > aa»} > * } . (C.32) 

By the above «o G (a mm ,oo). Further, by continuity, for a = ao> P{|A + cr*Z| > aa,} = 5. We thus 
proved claim (62). 

In order to prove the second statement in (61), we proceed analogously to part (a2), and define 
Qt = Rt / ((Jt0t-i) • This sequence satisfies the recursion (1C.18|) with Ft defined as per Eq. (|C.19j) . 
As t — > oo we have at — > cr* and hence Ft converges uniformly to a limit that we denote by an abuse 
of notation J- a ,s,p x , where 



a 



X 
a* 



+ Z 2 ;a 

Vcr* 



X 
cr* 



(C.33) 



Proceeding as in the proof of Lemma fTUl we conclude that Q h-» F aj § >px (Q) * s increasing and convex 
on [0, 1]. Further (for Z ~ N(0, 1)) 



(A \ A 1 2"! 1 
h^i;a) L = _ F(cr 2 ,acr*) = 1 . 
cr* / cr*J J erf 



Finally, for a > ao(5,px), 

d 

dQ 



J~a,8,px \Q) 



Q=l 



X 



>«}<!, 



(C.34) 



(C.35) 



and therefore F a> s t p x (Q) > Q for all Q G [0, 1). Hence, proceeding again as in the proof of part (a2) 
we conclude that lim^oo Qt = 1 and therefore limj_j.oo Rt,t—i = °"* as claimed. □ 

Proof of Lemma 0(63). Throughout this proof we fix px = (1 — e)5q + £7, 5 G (e,(5*(e)). By part 
(61), we have lim^oo E{|?/(A + cr t A; a(7t)|} = E{|??(A + cr*Z; acr*)|}. It is therefore sufficient to prove 
that E{|r/(A + cj*Z; a<7*)|} < E{|A|}. 

■ 



Consider the function £ : (cr 2 ,0) >-> £ (c 2 , 0) denned on K + x M + ) by 

*<«».») = -i(l - fl£ + Emb {i( S - X - .Z,* + |.|} 



(C.36) 
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where expectation is taken with respect to X ~ px and Z ~ N(0, 1). Notice that the minimum over 
s E E is uniquely achieved at s = n(X + a Z; 9). It is not hard to compute the partial derivatives 

{ f/. 2. 



= - W {^-- 6 n\X + aZ\>9})a 2 + H^,9)}, 



88 

86 
88 

da* 



(a 2 , 9) 



2# I 



1 - \F{\X + aZ\ > 
o 



0}}, 



(C.37) 
(C.38) 



where F(a 2 ,9) is defined as per Eq. (jC.ip . Using these expressions in Eq. ()C.36p we conclude that 



89^ 



88 
8a 1 



a 2 , 9) = 0^ S(a 2 ,9) = E{|?7(X + aZ; 9)\} 



(C.39) 



In particular, one can check from Eqs. (jCJ.37|) . (|C,37p that a stationary poin10 is given by setting 

a = a*(5,p x ) and 9 = 9*(5,p x ) = a (5,p x )cr*(5,Px)- 

Define E(a 2 ) = 8(a 2 ,a (5,p x )a). Using again Eqs. (IC\37l) . (EI381) we get 



dE , 2 , 5 



■{a 2 -F(a 2 ,a a)}. 



d^' -4oo»^ — - (C - 40) 
In particular, as a consequence of Lemma [71 and of the analysis at point (bl), we have ^ < for 
a 2 G (0,<7^) ()C.37p . Therefore, setting a = ao(5,px), we have 



E{\ V (X + a*Z;aa*)\} = E(a 2 J < lim E{a 2 ) 

(7->0 



- lim — ail -5) + lim — E 

o-->-o 2a o-^o 2a 

hmf a 2 + E{|X|} = E{|X|}. 
cr->-o 2a 



2 ; 



+ Z\ a 



(7 



z 



'} + limE{|??(X + CTZ;aCT)|} 



This concludes the proof. 



□ 



D Reference results 

The following calculus fact is used in the main text. 

Lemma 12. For all s,x > we have x s < (^e 1 . 

Proof. Since f(x) = ln(x) for x > is concave, when x > s then 

Hx) x : l f s) < f(s) = \ (d.i) 

This is equivalent to (x/s) s < e x ~ s which proves the result. The case of x < s is proved similarly. □ 

We also use an estimate on the minimum singular value of perturbed rectangular matrices, which 
was proved in |BC10t Theorem 1.1]. 

Theorem 10. For M, N G N, N < (1 - a)M, let B G M MxN , \\B\\ 2 < 1/a be any deterministic 
matrix and G G M. RIxN be a matrix with i.i.d. entries Gij ~ N(0, 1/M). Then there exist constants 
di, «2 depending only on a and bounded for a > such that, for all z < a 2 , 

Pjajv {A + v G) < v z } < (ai Z ) M ~ N+1 . (D.2) 

4 Indeed this is the unique saddle point of the function (O" 1 ^ 2 ) i-> £(8, a 2 ) as it can be proved by the general 
minimax theorem. 



54 



References 



[AGZ09] 
[ALPTJ11] 

[AS92] 
[BC10] 
[BM12] 
[BS98] 

[BS05] 
[DMM09] 

[DMM11] 
[Don05a] 
[Don05b] 
[DT05a] 

[DT05b] 

[DT09] 
[DT11] 



G. W. Anderson, A. Guionnet, and O. Zeitouni, An introduction to random matrices, 
Cambridge University Press, 2009. 

R. Adamczak, A.E. Litvak, A. Pajor, and N. Tomczak-Jaegermann, Restricted isome- 
try property of matrices with independent columns and neighborly polytopes by random 
sampling, Constructive Approximation (2011), 61-88. 

R. Affentranger and R. Schneider, Random projections of regular simplices, Discr. and 
Comput. Geometry 7 (1992), 219-226. 

P. Buergisser and F. Cucker, Smoothed analysis of moore-penrose inversion, SIAM J. 
Matr. Anal, and Appl. (2010), no. 31, 2769-2783. 

M. Bayati and A. Montanari, The LASSO risk for gaussian matrices, IEEE Trans, on 
Inform. Theory 58 (2012), 1997-2017. 

Z. Bai and J. Silverstein, No eigenvalues outside the support of the limiting spectral 
distribution of large- dimensional sample covariance matrices, Ann. Probab. 26 (1998), 
316-345. 

, Spectral Analysis of Large Dimensional Random Matrices, Springer, 2005. 



D. L. Donoho, A. Maleki, and A. Montanari, Message Passing Algorithms for Com- 
pressed Sensing, Proceedings of the National Academy of Sciences 106 (2009), 18914- 
18919. 

D.L. Donoho, A. Maleki, and A. Montanari, The Noise Sensitivity Phase Transition in 
Compressed Sensing, IEEE Trans, on Inform. Theory 57 (2011), 6920-6941. 

D. L. Donoho, High- dimensional centrally symmetric polytopes with neighborliness pro- 
portional to dimension, Discrete Comput. Geom. (2005), 617652. 

, Neighborly polytopes and sparse solution of underdetermined linear equations, 



Technical Report, Statistics Department, Stanford University, 2005. 

D. L. Donoho and J. Tanner, Neighborliness of randomly-projected simplices in high 
dimensions, Proceedings of the National Academy of Sciences 102 (2005), no. 27, 9452- 
9457. 

, Sparse nonnegative solution of underdetermined linear equations by linear pro- 
gramming, Proceedings of the National Academy of Sciences 102 (2005), no. 27, 9446- 
9451. 

, Counting faces of randomly projected polytopes when the projection radically 



lowers dimension, Journal of American Mathematical Society 22 (2009), 1-53. 

D. L. Donoho and J. Tanner, Observed universality of phase transitions in high- 
dimensional geometry, with implications for modern data analysis and signal processing, 
Phil. Trans. R. Soc. A (2011), 4273-4293. 



55 



[KWT09] Y. Kabashima, T. Wadayama, and T. Tanaka, A typical reconstruction limit for com- 
pressed sensing based on Ip-norm minimization, J.Stat. Mech. (2009), L09003. 

[Lub07] D.S. Lubinsky, A survey of weighted polynomial approximation with exponential weights, 
Approcimation Theory 3 (1007), 1-105. 

[MAYB11] A. Maleki, L. Anitori, A. Yang, and R. Baraniuk, Asymptotic Analysis of Complex 
LASSO via Complex Approximate Message Passing (CAMP), arXiv:1108.0477, 2011. 

[Ranll] S. Rangan, Generalized Approximate Message Passing for Estimation with Random Lin- 
ear Mixing, IEEE Intl. Symp. on Inform. Theory (St. Perersbourg), August 2011. 

[RFG09] S. Rangan, A. K. Fletcher, and V. K. Goyal, Asymptotic analysis of map estimation via 
the replica method and applications to compressed sensing, Neural Information Process- 
ing Systems (NIPS) (Vancouver), 2009. 

[SchlO] P. Schniter, Turbo Reconstruction of Structured Sparse Signals, Proceedings of the Con- 
ference on Information Sciences and Systems (Princeton), 2010. 

[TV12] T. Tao and V. Vu, Random matrices: The Universality phenomenon for Wigner ensem- 
bles, arXiv:1202.0068, 2012. 

[VS92] A. M. Vershik and P. V. Sporyshev, Asymptotic behavior of the number of faces of ran- 
dom polyhedra and the neighborliness problem, Selecta Math. Soviet. 11 (1992), 181201. 



56 



